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@ DNA sequences useful for the synthesis of carotenoids. 

® Disclosed are DNA sequences which are useful for the synthesis of carotenoids such as lycopene, j9- 
carotene, zeaxanthin or zeaxanthin-diglucoside, that is, DNA sequences encoding carotenoid biosynthesis 
enzymes. These DNA sequences are the sequences ® - ®, respectively, shown in the specification. 

Also disclosed is a process for producing a carotenoid compound which is selected from the group 
r- consisting of prephytoene pyrophosphate, phytoene, lycopene. >9-carotene, zeaxanthin and zeaxanthin- 
^diglucoside, which comprises transforming a host with at least one of the DNA sequences ® - © described 
Q above and culturing the transformant. 
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DNA SEQUENCES USEFUL FOR THE SYNTHESIS OF CAROTENOIDS 



BACKGROUND OF THE INVENTION 



Held of the Art 

The present invention relates to DNA sequences which are useful for the synthesis of carotenoids such 
as lycopene. jS-carotene, zeaxanthin or zeaxanthin-diglucoside. 

The present invention also relates to processes for producing such carotenoid compounds. 



Related Art 

Carotenoids are distributed widely in green plants. They are yellow-orange-red lipids which are also 
present in some mold, yeast and so forth, and have recently received increased attention as natural coloring 
materials for foods. Among these carotenoids, )3-carotene is a typical one, which is used as a coloring 
materials and as a precursor of vitamin A in mammals as well. It is also examined for its use as a 
component for preventing cancer [see, for example. SHOKUHIN TO KAIHATSU (Foods and Development), 
24, 61-65 (1989)]. Carotenoids such as jS-carotene are widely distributed in green plants, so that the plant 
tissue culture has been examined for the developnrient of a method for producing carotenoids in a large 
amount which is free from the influence of natural environment (see, for example. Plant Cell Physiol., 12, 
525-531 (1971)]. The examination has been also made for detecting a microorganism such as mold, yeast 
or green algae which is originally high carotenoid productive and for- producing carotenoids in a large 
amount with use of such microorganism (see. for example. The Abstract of Reports in the Annual Meeting 
of NIPPON HAKKO KOGAKU-KAI of 1988. page 139). However, neither of these methods are successful at 
present in producing )3-carotene at a good productivity which exceeds the synthetic method iri commercial 
production of ^-carotene.. It would be very useful to obtain a gene group which participates in the 
biosynthesis of carotenoids. because it will be possible to produce carotenoids in a large amount by 
introducing a gene group which has been reconstructed to express proper genes in the gene group in a 
large amount, into an appropriate host such as a plant tissue culture cell, a mold, an yeast or the like which 
originally produces carotenoids.. Such a development in technology has possibilities for finding a method of 
producing ^-carotene superior to the synthetic method and a method of producing useful carotenoids other 
than ;8-carotene in a large amount. 

Furthermore, the synthesis of carotenoids in a cell or an organ which produces no carotenoid will be 
possible by obtaining the gene group participating in the biosynthesis of carotenoids. which will add new 
values to organisms. For example, several reports have recently been made with reference to creating 
flower colors which cannot be found in nature by using genetic manipulation in flowering plants (see, for 
example. Nature. 330. 677-678 (1987)]. The color of flowers is developed by pigments such as an- 
thocyanine or carotenoids. Anthocyanine is responsible for flower colors in the spectrum of red-violet-blue, 
and carotenoids are responsible for flower colors in the spectrum of yeilow-orange-red. The gene of the 
enzyme for synthesizing anthocyanine has been elucidated, and the aforementioned reports for creating a 
new flower color are those referring to anthocyanine. On the other hand, there are many flowering plants 
having no bright yellow flower due to no function of synthesizing carotenoids in petal (e.g. petunia, 
saintpaulia (african violet), cyclamen. Primula malacoides. etc.). If suitable genes having been reconstructed 
so as to be expressed in petal in a gene group referring to the biosynthesis of carotenoids are introduced 
into these flowering plants, the flowering plants having yellow flowers will be created successfully. 

However, enzymes for synthesizing carotenoids or genes coding for them have been scarcely 
elucidated at present. The nucleotide sequence of the gene group participating in the biosynthesis of a kind 
of carotenoids has been elucidated lately only in a photosynthetic bacterium Rhodobacter capsulatus (MoL 
Gen. Genet.. 216, 254-268 (1989)]. But this bacterium synthesizes the acyclic xanthophyll spheroidene via 
neurosporene without cyclization and thus cannot synthesize general carotenoids such as lycopene, /9- 
carotene and zeaxanthin. 

There are prior arts with reference to yellow pigments or carotenoids of Erwinia species disclosed in j. 
BacterioL. 168, 607-612 (1986), J, BacterioL, 170. 4675-4680 (1988) and J. Gen, Microbiol., 130. 1623-1631 
(1984), The first one of these references discloses the cloning of a gene cluster coding for yellow pigment 
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synthesis from Erwinia herbicola Eho 10 ATCC 39368 as a 12.4 kiiobase pair (kb) fragment. In this 
connection, there is no illustration of the nucleotide sequence of the 12.4 kb fragment. The second" literature 
discloses the yellow pigment synthesized by the cloned gene cluster, which is indicated to belong to 
carotenoids by the analysis of its UV-visible spectrum. The last literature indicates that the gene participat- 
5 ing in the production of a* yellow pigment is present in a 260 kb large plasmid contained in Erwinia 
uredovora 20D3 ATTC 19321 from the observation that the yellow pigment is not produced on curing the 
large plasmid. and further discloses that the pigment belongs to carotenoids from the analysis of its UV- 
visible spectrum. 

However, the chemical structures of carotenoids produced by the Erwinia species or of its metabolic 
10 intermediates, enzymes participating in the synthesis of them or the nucleotide sequence of the genes 
encoding these enzymes remain unknown at present 

DISCLOSURE OF THE INVENTION 

;5 



Outline of the Invention 

20 The object of the present invention is to provide ONA sequences which are useful for the synthesis of 
carotenoids such as lycopene. )3-carotene, zeaxanthin or zeaxanthin-dlglucoside, that is DNA sequences 
encoding carotenoid biosynthesis enzymes. 

In other words, the DNA sequences useful for the synthesis of carotenoids according to the present 
invention are the DNA sequences ® - ® described in the following (1) - (6). 

25 (1) a DNA sequence encoding a polypeptide which has an enzymatic activity for converting 

prephytoene pyrophosphate into phytoene and whose amino acid sequence corresponds substantially to 
the amino acid sequence from A to B shown in Rgs. 1-(a) and (b) (DNA sequence ®); 

(2) a DNA sequence encoding a polypeptide which has an enzymatic activity for converting 
zeaxanthin into zeaxanthin-diglucoside and whose amino acid sequence corresponds substantially to the 

30 amino acid sequence from 0 to D shown in Figs. 2-(a) and (b) (DNA sequence @); 

(3) a DNA sequence encoding a polypeptide which has an enzymatic activity for converting lycopene 
into ^-carotene and whose amino acid sequence corresponds substantially to the amino acid sequence 
from E to F shown in Figs. 3-{a) and (b) (DNA sequence (D); 

(4) a DNA sequence encoding a polypeptide which has an enzymatic activity for converting phytoene 
35 into lycopene and whose amino acid sequence corresponds substantially to the amino acid sequence from 

G to H shown in Figs. 4-(a). (b) and (c) (DNA sequence @); 

(5) a DNA sequence encoding a polypeptide which has an enzymatic activity for converting 
geranylgeranyl pyrophosphate into prephytoene pyrophosphate and whose amino acid sequence cor- 
responds substantially to the amino acid sequence from ! to J shown in Figs. 5-(a) and (b) (DNA sequence 

40 ©); and 

(6) a DNA sequence encoding a polypeptide which has an enzymatic activity for converting ^- 
carotene into zeaxanthin and whose amino acid sequence corresponds substantially to the amino acid 
sequence from K to L shown in Fig. 6 (ONA sequence ®). 

Another object of the present invention is to provide processes for producing carotenoid compounds. 
45 More specifically, the present invention also provides a process for producing a carotenoid compound 
which is related from the group consisting of prephytoene pyrophosphate, phytoene. lycopene, )9-carotene, 
zeaxanthin and zeaxanthin-diglucoside, which comprises transforming a host with at least ~one of DNA 
sequences ® - ® described above and culturing the transformant. 

50 

Effect of the Invention 

The successful acquirement of the gene group (gene group encoding the biosynthetic enzymes of 
carotenoids) useful for the synthesis of carotenoids such as lycopene, ^-carotene, zeaxanthin, zeaxanthin- 
55 diglucoside or the like according to the present invention has made it possible to produce useful 
carotenoids in large amounts, for example, by creating a plasmid in which the gene(s) can be expressed in 
a large amount and employing an appropriate plant tissue culture cell, a microorganism or the like 
transformed with the plasmid. The success in acquiring the gene group useful for the synthesis of 
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carotenoids such as lycopene. ^-carotene, zeaxanthin. zeaxanthin-diglucoside or the like according to the 
present invention has made it possible to synthesize carotenoids in cells or organs which- produce no 
carotenoid by creating a plasmid in which the gene(s) can be expressed in a target ceil or organ and 
transforming a suitable host with this plasmid. 



DETAILED DESCRIPTION OF THE INVENTION 



to The DNA sequences according to the present invention are the aforementioned DNA sequences ® - © 
. that IS. genes encoding the polypeptides of respective enzymes which participate in the biosynthesis 
reaction of carotenoids. in particular, for example, such polypeptides in Erwinia uredovora 20D3 ATCC 

A variety of gene groups containing the DNA sequences of a combination of a plurality of sequences 
IS among these DNA sequences ® - © can be expressed in a microorganism, a plant or the like to afford 
them the biosynthesis ability of carotenoids such as lycopene. /3-carotene. zeaxanthin. zeaxanthin- 
diglucos.de or the like. The respective DNA sequences constructing the gene group may be present on a 
DNA strand or on different DNA strands individually, or optionally, the respective DNA sequences may 
compnse a plurality of DNA sequences present on a DNA strand and a DNA sequence present on another 
20 DNA strand. 

The aforementioned gene group encode the polypeptides of a plurality of enzymes participating in the 
production of carotenoids. A recombinant DNA is created by incorporating the gene group into a proper 
vector and then introduced into a suitable host to create a transformant, which is cultured to produce mainly 
in the transformant a plurality of enzymes participating In the formation reaction of carotenoids and to 

2S conduct the biosynthesis of carotenoids in the transfomnant by these enzymes. 

The DNA sequence shown in Rg. .7-(a) to (g). which is an example according to the present invention 
IS acquired from Erwinia uredovora 20D3 ATCC 19321 and thus exhibits, as illustrated in the experimental 
example below, no homology In the DNA-DNA hybridization with the DNA strand containing the gene group 
for synthesizing the yellow pigment of Erwinia herbicola Eho 10 ATCC 39368 (see Related Art described 

30 atlove). - 

DNA Sequences encoding the polypeptide of each enzyme 

sequences of the present invention are the DNA sequences ® - ® (or the DNA strands ® - 
®). respectively. Each of the DNA sequences contains a nucleotide sequence encoding the polypeptide 
whose ammo acid sequence corresponds substantially to such an amino acid sequence as in the 
aforementioned specific regions in Rgs. 1 - 6 (for example, from A to B in Rg. 1). In this connection the 
term "DNA sequence" means a polydeoxyribonucleic acid sequence having a length. In the present 

40 invention, the "DNA sequence" is defined by an amino acid sequence of a polypeptide which is encoded 
by the DNA sequence and has a definite length as described above, so that each DNA sequence has also a 
definite length. However, the DNA sequence contains a gene encoding each enzyme and is useful for 
biotechnological production of the polypeptide, and such biotechnological production cannot be performed 
by only the DNA sequence having a definite length but can be performed in the state where other DNA 

45 sequence with a proper length is linked to «ie 5 -upstream and/or the 3'-downstream of the DNA sequence 
Therefore, the term "DNA sequence" in the present invention includes, in addition to those having a definite 
length (for example, the length in the region of A - B in the corresponding amino acid sequence of Rg 1) 
ttiose in the form of a linear DNA strand or a circular DNA sti-and containing the DNA sequence having a 
definite length as a member. 

so One of the typical forms of each DNA sequence according to the present invention is a form of a 
plasmid which comprises the DNA sequence as a part of a member or a form in which the plasmid is 
present in a host such as E. coli. The plasmid as one of the preferable existing forms of each DNA 
sequence according to the present invention is a conjunction of «ie DNA sequence according to the present 
invention as a passenger or a foreign gene, a replicable plasmid vector present stably in a host and a 
promoter (containing ribosome-binding sites in the case of a procaryote). As the plasmid vector and the 
promoter, an appropriate combination of tiiose which are well-known can be used. 
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Polypeptides encoded by DNA sequences 

As mentioned above, the DNA sequences according to the present invention are respectively specified 
by the amino acid sequences of the polypeptides encoded thereby. Each of these polypeptides is the one 
having an amino acid sequence which corresponds substantially to an amino acid sequence in a specific 
region as described above in Figs. 1 - 6 (for example, from A to B in Fig. 1). Here, in the six (A-B. C-0, E-F, 
G-H. I- J. K-L) polypeptides shown in Rgs. 1-6 (i.e. six enzymes participating in the formation of 
carotenoids), some of the amino acids can be deleted or substituted or some amino acids can be added or 
inserted, etc.. so long as each polypeptide has the aforementioned enzymatic activity in the relationship of a 
substrate and a converted substance (a product). This is indicated by the expression "whose amino acid 
sequence corresponds substantially to ..." in the claims. For example, each polypeptide that first amino acid 
(Met) has been deleted from each polypeptide shown in Rgs. 1 - 6 is included in such deleted 
polypeptides. 

The typical polypeptides having enzymatic activities, respectively, in the present invention are those in 
the specific regions in Figs. 1 - 6 described above, and the amino acid sequences of these polypeptides 
have not been known. 

Nucleotide sequences of DNA sequences 

The DNA sequences encoding the respective enzymes are those, having the nucleotide sequences in 
the aforementioned specific regions in Figs. 1 - 6 (for example, A-B in Rg. 1) or degenerative isomers 
thereof, or those having the nucleotide sequences corresponding to the aforementioned alteration of the 
amino acid sequence of respective enzymes or degenerative isomers thereof. The term "degenerative 
isomer" means DNA sequence which is different only in degenerative codon and can code for the same 
polypeptide. The preferred embodiments of the DNA sequences according to the present invention are 
those having at least one stop codon (such as TAA) at the s'-terminal. The s'-upstream and/or the 3 - 
downstream of the DNA sequences according to the present invention may further have a DNA sequence 
with a certain length as a non-translation region (the initial portion of the s'-downstream being usually a stop 
codon such as TAA). 



Gene group used for the synthesis of carotenoids 

The gene group (the gene cluster in some case) used for the synthesis of carotenoids comprises a 
plurality of the aforementioned DNA sequences ® - ® » whose typical examples are illustrated in the 
following (1) - (4). Each gene group encodes a plurality of polypeptides of respective enzymes and these 
enzymes participate in the production reaction of carotenoids to produce them from their substrates. 



(1) Gene group used for the synthesis of lycopene 

The gene group used for the synthesis of lycopene which is a red carotenoid is DNA sequence 
comprising the aforementioned DNA sequences ®. @ and d). and such a gene group includes the one in 
which respective DNA sequences are present on one DNA strand or on different DNA strands separately or 
the one which is constructed by the combination of the aforementioned ones according to necessities. 

In the case that a plurality of DNA sequences are present on one DNA strand, the arrangement order 
and direction of the aforementioned DNA sequences ®. ® and © may be optional provided that the 
genetic information is capable of expression, that is to say respective genes in a host are in a state of being 
transcribed and translated appropriately. 

The biosynthetic pathway of lycopene in E. coli is explained as follows: geranylgeranyl pyrophosphate 
which is a substrate originally present in Er"coTris converted into prephytoene pyrophosphate by the 
enzyme encoded by the DNA sequence . the prephytoene pyrophosphate is then converted into 
phytoene by the enzyme encoded by the DNA sequence ®. and the phytoene is further converted into 
lycopene by the enzyme encoded by the DNA sequence ® (see Rg. 8). 

Lycopene is a carotene whose color is red. Lycopene is a red pigment which is present in a large 
amount in the fruits of water melon or tomato and has high safety for food. In this connection, the lycopene 
which was synthesized by the DNA sequences according to the present invention in the experimental 
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example described below had the same stereochemistry as lycopene present in these plants 

One of the typical existing forms of the gene group of the present invention is a form "of a olasmid 
wh.ch comprises the respective ONA sequences containing a stop codon as a member or a form in which 
the plasmid is present in a host such as E. coli. The plasmid which is one of the preferred existing forms of 
the gene group according to the present invention comprises a gene group as a passenger or a foreiqn 
gene, a replicable plasmid vector present stably in a host and a promoter (containing ribosome-binding 
sites m the case of a procaryote). As the promoter, in procaryotes such as E. coli or Zymomonas species a 
promoter which is common to respective ONA sequences can be usTdrSr alternatively respective 
promoters can be used to the respective ONA sequences. In the case of eucaryotes such as yeast or plant 
respective promoters are preferably used to respective ONA sequences. 

«. ^Ll! °' preferred ^sting forms of the ONA sequences are described above in the explanation of 
the ONA sequences ® - ®. 



rs (2) Gene group used for the synthesis of /3-carotene 

The gene group used for the synthesis of yS-carotene which is one of yellow-orange carotenoids is a 
ONA sequence comprising the aforementioned ONA sequences ®. ® and (f). In other words the 
gene group used for the synthesis of ^-carotene is formed by adding the ONA sequence @ to a ONA 

20 sequence used for the synthesis of lycopene comprising the ONA sequences and ® 0 and <D The 
gene group includes the one in which the respective ONA sequences constructing the gene group may be 
present on one ONA strand or on different ONA strands individually, or the one which is constructed by the 
combination of the aforementioned ones according to necessities. 

In the case that a plurality of ONA sequences are present on one ONA strand, the arrangement order 

as and direction of the aforementioned ONA sequences ©. ®. ® and © may be optional provided that the 
genetic infomnation is capable of expression, that is to say respective genes In a host are in a state of beinq 
transcnbed and translated appropriately. 

The biosynthetic pathway of ^-carotene in E. coli is explained as follows: geranylgeranyl pyrophosphate 
which IS a substrate onginally present in E. coli is converted into prephytoene pyrophosphate by the 

30 enzyme encoded by the ONA sequence ®. the prephytoene pyrophosphate is converted into phytoene by 
the enzyme encoded by the ONA sequence ®. the phytoene is further converted into lycopene by the 
enzyme encoded by the ONA sequence ®. and the lycopene is further converted into /3-carotene by the 
enzyme encoded by ttie ONA sequence @. (see Fig. 8). 

^-carotene is a typical carotene whose color is in the spectrum ranging from yellow to orange and it is 

JS an orange pigment which is present in a large amount in the roots of carrot or green leaves of plants and 
has high safety for food. The utility of 5-carotene has already been described in the explanation of related 
art. In ttiis connection, the /3-carotene which was synthesized by the DNA sequence according to the 
present invention in the experimental example described below had the same stereochemistry as B- 
carotene present in the roots of can-ot or green leaves of plants. 

" . r ^^'""^^ °* 9®"® 9'°"P individual ONA sequences are the same as 

□efined in (1 ). 
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(3) Gene group used for the synthesis of zeaxanthin 

The gene group used for the synthesis of zeaxanthin which is one of yellow-orange carotenoids is a 
DNA sequence comprising the aforementioned ONA sequences ®. (3). @. ® and ©. In other words, the 
DNA sequence used for the synthesis of zeaxanthin is formed by adding the ONA sequence (© to a DNA 
sequence used for the synthesis of yS-carotene comprising the ONA sequences (D ® ® (D the gene 
group includes tine one in which the respective ONA sequences constructing the gene group are present on 
one DNA strand or on different ONA strands individually, or the one which is constructed by the 
combination of the aforementioned ones according to necessities. 

In the case that a plurality of ONA sequences are present on one ONA strand, the arrangement order 
and direction of the aforementioned ONA sequences ®. ®. ®. ® and © may be optional provided that 
ttie genetic information is capable of expression, that is to say respective genes in a host are in a state of 
being transcribed and translated appropriately. 

The biosyntiietic paUiway of zeaxantiiin in E. coli is explained as follows: geranylgeranyl pyrophosphate 
which IS a substrate originally present in E. coli is converted into prephytoene pyrophosphate by the 
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enzyme encoded by the DNA sequence © the prephytoene pyrophosphate is converted into phytoene by 
the enzyme encoded by the DNA sequence ©. the phytoene is then converted into lycopene by the 
enzyme encoded by the DNA sequence ®. and the lycopene is further converted into /3-carotene by the 
enzyme encoded by the DNA sequence @. and finally the'^- carotene is converted into zeaxanthin by the 

5 enzyme encoded by the DNA sequence ® (see Fig. 8). 

Zeaxanthin is a xanthophyll whose color is in the spectrurh ranging from yellow to orange, and it is an 
yellow pigment which is present in the seed of maize and has high safety for food. Zeaxanthin is contained 
in feeds for hen or colored carp and is an important pigment source for coloring them. In this connection, 
the zeaxanthin which was synthesized by the DNA sequences according to the present invention in the 

10 experimental example described below had the same stereochemistry as zeaxanthin described above. 

One of the typical existing forms of the gene group and the individual DNA sequences is the same as 
defined in (1). 

;5 (4) Gene group used for the synthesis of zeaxanthin-diglucoside 

The gene group used for the synthesis of zeaxanthin-diglucoside which is one of yellow-orange 
carotenoids is a DNA sequence comprising the aforementioned DNA sequences ® - ®. in other words, 
the gene group used for the synthesis of zeaxanthin-diglucoside is formed by adding the DNA sequence @ 
20 to a DNA sequence used for the synthesis of zeaxanthin comprising the DNA sequences ®. @. ®» ® 
and ©. The gene group includes the one in which the respective DNA sequences constructing the gene 
group are present on one DNA strand or on different DNA strands individually, or the one which is 
constructed by the combination of the aforementioned ones according to necessities. 

In the case that a plurality of DNA sequences are present on one DNA strand, the arrangement order 
25 and direction of the aforementioned DNA sequences (3) - © may be optional provided that the genetic 
information is capable of expression, that is to say respective genes in a host are in a state of being 
transcribed and translated appropriately. 

One of the typical existing forms of the gene group and the individual DNA sequences is the same as 
defined in (1). 

3Q The biosynthetic pathway of zeaxanthin-diglucoside in E. col[ is explained as follows: geranylgeranyl 
pyrophosphate which is a substrate originally present in E. coli is converted into prephytoene 
pyrophosphate by the enzyme encoded by the DNA sequence © . the prephytoene pyrophosphate is 
converted into phytoene by the enzyme encoded by the DNA sequence ®. the phytoene is then converted 
' into lycopene by the enzyme encoded by the DNA sequence @. and the lycopene is further converted into 
36 i3-carotene by the enzyme encoded by the DNA sequence ® . the /3-carotene is then converted into 
zeaxanthin by the enzyme encoded by the DNA sequence ©. and the zeaxanthin is finally converted into 
zeaxanthin-diglucoside by the enzyme encoded by the DNA sequence © (see Fig. 8). 

Zeaxanthin-diglucoside is a carotenoid glycoside having a high water solubility and a pigment which is 
soluble sufficiently in water at room temperature and exhibits clear yellow. Carotenoid pigments are 
4o generally hydrophobic and thus limited on their use as natural coloring materials in foods or the like. 
Therefore, zeaxanthin-diglucoside settles this defect. Zeaxanthin-diglucoside is isolated from edible plant 
saffron. Croccus sativus (Pure & Appl. Chem.. 47. 121-128 (1976)). so that it is thought that its safety for 
food has been confirmed. Therefore, zeaxanthin-diglucoside is desirable as a yellov/ natural coloring 
material of foods or the like. In this connection, there has been heretofore no reports with reference to the 
45 isolation of zeaxanthin-diglucoside from microorganisms. 

If carotenoid pigments such as lycopene. /3-carotene. zeaxanthin and zeaxanthin-diglucoside are 
intended to be produced, the aforementioned DNA sequences ®. ® and ®. the DNA sequences ®, ®. 
@ and ©. the DNA sequences ®. @, ®, © and ©. and the DNA sequences ® - ® are required, 
respectively, on using E. coli as the host. However, when a host other than E. coli. particularly the one 
• 50 which is capable of producing carotenoids is used, it has a high possibility of containing also carotenoid 
precursors at further downstream in the biosynthesis, so that all of the aforementioned DNA sequences ®, . 
® and ® (for the production of lycopene). all of the DNA sequences ®. ®. ® and © (for the production 
of ;3-carotene). all of the DNA sequences ®. @. @. © and © (for the production of zeaxanthin). or all of 
the DNA sequences ® - © (for the production of zeaxanthin-diglucoside) are not always required. 
55 That is to say. only the DNA sequence(s) participating in the formation of an aimed carotenoid pigment 

from a carotenoid precursor present at the furthest downstream in the host may also be used in this case. 
Thus, when lycppene is intended to be produced as an aimed carotenoid in a host in which phytoene is 
prelirninarily present, it is also possible to use only the DNA sequence ® among the DNA sequences ®. 
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® and d). 

It is also possible to make a host to produce, as the aimed carotenoid pigment relating compound, 
prephytoene pyrophosphate from geranylgeranyl pyrophosphate by using only the DNA sequence (S) of the 
present invention, or phytoene by using the DNA sequences ® and (g) of the present invention or. if the 
5 host contains prephytoene pyrophosphate, by using only the DNA sequence ® . 



Acquirement of DNA sequences 

JO A method for acquiring the DNA sequences ® - ® which contain the nucleotide sequences coding for 
the amino acid sequences of the respective enzymes is the chemical synthesis of at least a part of their 
strand by the method of polynucleotide synthesis. However, if it is taken into consideration that a number of 
amino acids are bonded, it would be more preferable than the chemical synthesis to acquire the DNA 
sequences from the DNA library of Erwinia uredovora 20D3 ATTC 19321 according to a conventional 

15 method in the field of genetic engineering, for example, the hybridization method with a suitable probe. 

The individual DNA sequences or the DNA sequence comprising, all of these Sjequences are thus 
obtained. 



20 Transformant 

The aforementioned gene group comprising a plurality of the DNA sequences ® - ® can be 
constituted by using the DNA sequences obtained as described above. The DNA sequence thus obtained 
contains genetic informations for making an enzyme participating in the formation of carotenoids, so that it 
25 can be introduced into an appropriate host by the biotechnological method to form a transformant and to 
produce an enzyme and in its turn a carotenoid pigment or a carotenoid pigment relating compound. 

(1) Host 

30 

Plants and a variety of microorganisms, as far as a suitable host-vector system is present, can be the 
target of transformation by a vector comprising the aforementioned DNA sequences. However, the host is 
required to contain geranylgeranyl pyrophosphate which is a. substrate compound of an enzyme for starting 
the carotenoid synthesis with use of the DNA sequences of the present invention, or a compound-further 
35 downstream from it. 

It is known that geranylgeranyl pyrophosphate is synthesized by dimethylallyltransferase which is a 
common enzyme at the initial stage of the biosynthesis of not only carotenoids but also sterols or terpenes 
[J. Biochem.. 72, 1101-1108 (1972)]. Accordingly, if a cell which cannot synthesize carotenoids can 
synthesize sterols or terpenes, it probably contains geranylgeranyl pyrophosphate. It is believed that a cell 
40 contains at least one of sterols or perpenes. 

Therefore, it is believed theoretically that almost all hosts are capable of synthesizing carotenoids by 
using the DNA sequences of the present invention as far as a suitable host-vector system is present. 

As the hosts in which the host-vector system is present, there are mentioned plants such as Nicotiana 
tabacum . Petunia hybrida and the like, microorganism such as bacteria, for example Escherichia coli, 
45 Zymomonas mobilis and the like, and yeasts, for example Saccharomyces cerevisiae and the like. 



(2) Transformation 

so It is confirmed for the first time by the present invention that the genetic informations present on the 
DNA sequences of the present invention has been expressed in microorganisms. However, the procedures 
or the methods for making the transformants (and the production of enzymes or in its turn carotenoid 
pigments or carotenoid pigment relating compounds by the transformants) are per se conventional in the 
fields of molecular biology, cell biology or genetic manipulation, and thus the procedures other than 

55 described below may be performed in accordance with these conventional techniques. 

In order to express the gene of the DNA sequences according to the present invention in a host, it is 
necessary to .insert the gene into a vector for introducing it into the host. As the vector used in this stage, 
there is used all of various known vectors such as pBM2l or the like for plants (Nicotiana tabacum. Petunia 
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hybrida); pUC19, pACYC184 or the like for E, coli; PZA22 or the like for Zymomonas mobilis (see Japanese 
Patent Laid-Open Publication No. 228278/87)? and YEp13 or the like for yeast 

On the other hand, it is necessary to transcribe the DNA sequence of the present invention onto mRNA 
in order to express the gene of the DNA sequence in the host. For this purpose, a promoter as a signal for 

5 the transcription may be integrated into the 5 -upstream region from the DNA sequence of the present 
invention. A variety of promoters such as CaMV35S. NOS, TRl'. TR2' (for plants); lac, Tc', CAT, trp (for E. 
coli); Tc^ CAT (for Zymomonas mobilis ): ADHI, GAL7, PGK. TRP1 (for yeast) and the like are known as for 
the promoters, and either of these promoters can be used in the present invention. 

In the case of procaryote. it is necessary to place ribosome-binding site (SD sequence in E. coli) 

;o several base-upstream from the initiation codon (ATG). 

In this connection, while the aforementioned manipulation is necessary for producing the enzyme 
protein, one or more of amino acids may be inserted into or added to the polypeptide which is illustrated in 
the specific ranges of Figs. 1 - 6 (e.g. the polypeptide A -B illustrated in Fig. 1), one or more of amino acids 
may be deleted, or replaced, as described above. 

/5 The transformation of the host with the plasmid thus obtained can be conducted optionally by an 
appropriate method which is conventionally used in the fields of genetic manipulation or cell biology. As for 
the general matters, there can be referred to appropriate publications or reviews; for example as for the 
transformation of microorganisms, T. Maniatis, E. F. Fritsch and J. Sambrook: "Molecular Cloning A 
Laboratory Manual". Cold Spring Harbor Laboratory (1982). 

20 The transformant is the same as the host used, in its genotype, phenotype or bacteriological properties 
but for the new trait derived from the genetic information introduced by the DNA sequence of the present 
invention (that is, the production of an enzyme participating in the carotenoid formation and the synthesis of 
carotenoids or the like by the enzyme), the trait derived from the vector used and the deletion of the trait 
corresponding to the deletion of a part of the genetic information of the vector which might be caused on 

25 the recombination of genes. Escherichia coli JM109 (pCAR1) which is an example of the transformant 
according to the present invention is deposited as PERM BP-2377. 

Expression of genetic information/production of carotenoids 
30 ' 

The clone of the transformant obtained as described* above produces mainly in the transformant an 
enzyme participating in the carotenoid formation, and a variety of carotenoids or carotenoid pigment relating 
compounds are synthesized by the enzyme. 

Culture or the culturing condition of the transformant is essentially the same as those for the host used. 
35 Carotenoids can be recovered by the methods, for example, illustrated in Experimental Examples 3 and 
4 below. 

Furthermore, each enzyme protein coded by each DNA sequence of the present invention is produced 
mainly in the cell tn the case of the transformation of E. coli , and it can be recovered by an appropriate 
method. 

40 

BRIEF DESCRIPTION OF THE DRAWINGS 



45 Rgs. 1 - 6 illustrate the nucleotide sequences in the DNA sequences ® - (B) in coding regions, and 

the amiao acid sequences of proteins to be encoded, respectively, 

Fig. 7 illustrates the Kpn I- Hind lll fragtment which was acquired from Erwinia uredovora 20D3 ATCC 

19321 and relates to the biosynthesis of carotenoids, that is the complete nucleotide sequence of the 6918 

bp DNA sequence containing the DNA sequences in Figs. 1-6, and 
so Fig. 8 illustrates the function of the polypeptides encoded by the aforementioned DNA sequences CD 

-®. 

Experiments 

55 

All of strains used in the following experiments are deposited in ATCC or other deposition organizations 
and are freely available. 
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Experimental Example 1: Cloning of a gene cluster participating in the biosynthesis of a yellow pigment 
(referred to hereinafter as yellow pigment-synthesizing gene cluster) 



(1 ) Preparation of total DNA 

Total ONA was prepared from the ceils of Erwinia uredovora 20D3 ATCC 19321 which had been 
proliferated until the early- stationary phase in 100 ml of LB medium (1% tryptone, 0.5% yeast extract, 1% 
NaCI). Penicillin G (manufactured by Meiji Seika) was added to the culture medium so that it has a 
concentration of 50 units/ml in the medium before 1 hour of the harvest of the cells. After harvesting the 
ceils by centrifugation. this was washed with the TES buffer (20 mM tris. 10 mM EDTA, 0.1 M NaCI, pH 8), 
heat treated at 68* C for 15 minutes and suspended in Solution I (50 mM glucose. 25 mM Tris, 10 mM 
EDTA. pH 8) containing 5 mg/ml of lysozyme (manufactured by Seikagaku Kogyo) and 100 ug/ml of RNase 
A (manufactured by Sigma). The suspension was incubated at 37* C for a period of 30 minutes - 1 hour, 
and pronase E (manufactured by Kaken Seiyaku) was added so that it had a concentration of 250 ug/ml 
before incubation at 37 *C for 10 minutes. Sodium N-Iauroylsarcosine (manufactured by Nacalai tesque) 
was added so as it had the final concentration of 1%. and the mixture was agitated before incubation at 
37* C for several hours. Extraction was conducted several times with phenol/chloroform. While ethanol in 
volume of 2 equivalents was slowly added, the resulting total DNA was wound around a glass stick, rinsed 
with 70% ethanol and dissolved in 2 ml of TE buffer (10 mM Tris. 1 mM EDTA, pH 8) to give the total DNA 
preparation. . 



(2) Construction of an Escherichia coli cosmid library and acquirement of E. coli transform ants producing 
yellow pigments 

Incubation was conducted with 1 unit of restriction enzyme Sau3AI per 50 ul of the total DNA 
preparation at 37'C for 30 minutes before the inactivatlon treatment of the restriction enzyme at 68 ' C for 
10 minutes. Many fragments partially digested with Sau3AI were obtained in the neighbourhood of 40 kb 
under this condition. After the ethanol precipitation of this reaction solution, this half portion was mixed with 
2.5 ug of cosmid PJB8 which had been digested with BamHI and treated with alkaline phosphatase and 0.2 
ug of a pJB8 Sall-BamHI right arm fragment (smaller fragment) which had been recovered from a gel, and 
40 ul of the total amount was subjected to ligation reaction with T4 DNA ligase at 12* C for 2 days. In this 
connection, the cosmid pJ88 had been previously purchased from Amersham. Restriction enzyme's and 
enzymes used for genetic manipulation were purchased from Boehringer-Mannheim, Takara Shuzo or Wako 
Pure Chemical Industries. This ONA in which the ligation reaction had been thus performed was used for in 
vitro packaging with a Gigapack Gold (manufactured by Stratagene, marketed from Funakoshi) to give a 
large amount of phage particles sufficient for construction of a cosmid library. The phage particles were 
infected wfth Escherichia coli DH1 (ATCC 33849). After the cells of E. coli DHI infected were diluted so as 
to be 100 colonies per plate, they were plated on a LB plate, cultured at 37 'C overnight and further at 
30 C for 6 hours or more. As a result, E^ coli transformants producing yellow pigments appeared in a 
proportion of one colony per about 1,100 colonies. These E. coli transformants producing yellow pigments 
contained plasmids in which 33 - 47 kb Sau3 AI partial digestion fragments were inserted into the pJB8. 



(3) Location of a yellow pigment-synthesizing gene cluster 

A yellow pigment-synthesizing gene cluster was inserted into the pJB8 as the 33 - 47 kb Sau3AI partial 
digestion fragments. One of these fragments was further subjected to partial digestion with Sau3AL ligated 
to the Bam HI site of the E. coH vector pUCi 9 (purchased from Takara Shuzo), and useTlb transform 
Escherichia coli JM109 (manufactured by Takara Shuzo). To locate the yellow pigment-synthesizing gene 
cluster, plasmid DNA's were prepared from 50 E. coli transformants producing yellow pigments which 
appeared in the LB plate containing ampicillin. and analyzed by agarose gel electrophoresis. As a result, it 
was found that the smallest inserted fragment was of 8.2 kb. The plasmid containing this 8.2 kb fragment 
was named as pCARI and E. coli JM109 harboring this plasmid was named as Escherichia coli JM109 
(pCARl). This strain produced the same yellow pigments as those of E. uredovora. The 8.2 kbTragment 
contained a Kpn l site in the neighbourhood of the terminal at the lac pro'moter side and a Hindlll site in the 
neighbourhood at the opposite side. After the 8.2 kb fragment was subjected to doubie~digestion with 
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KpnI/Hindlll (Hindlll was partially digested; the 8.2 kb fragment had two Hindlll sites), the Kpn I- Hind lll 
fragment (6.9 kb) was recovered from a gel and ligated to the KpnI-Hindlll site of pUCl8 (this hybrid 
plasmid was named as pCARIS). Upon the transformation of E. co|i JM109, the E. coH transformant 
exhibited yellow and produced the same yellow pigments as those of E. uredovora . Accordingly, it was 
5 found out that the genes required for the yellow pigment production was located on the Kpn I- Hind lll 
fragment (6.9 kb). That is to say, the fragment carrying the yellow pigment-synthesizing geries was capable 
of being reduced to a 6.9 kb in size. 



10 Experimental Example 2: Analysis of the yellow pigment-synthesizing gene cluster 



(1 ) Determination of the nucleotide sequence of the yellow pigment-synthesizing gene cluster 

75 The complete nucleotide sequence of the 6.9 kb Kpn I- Hind lll fragment was determined by the kilo- 
sequenee method using Deletion kit for kilo-sequence (manufactured by Takara Shuzo) and the dideoxy 
method according to Proc. Natl. Acad. Sci. USA, 74 5463-5467 (1977). As a results it was found that the 
Kpn I- Hindl ll fragment containing the yellow pigment-synthesizing genes (DNA strand) was 691 8 base pairs 
(bp) in length and its GC content was 54%. The complete nucleotide sequence was shov/n in Fig. 7 (a) - 

20 (g). The Kpn l site is represented by the base number '1 . 

(2) Elucidation of yellow pigment-synthesizing gene cluster 

25 The Hindlll side of the 6918 bp fragment (DNA strand) containing the yellow pigment-synthesizing 
genes (right terminal side in Fig. 7) was deleted with Deletion kit for kilo-sequence. A hybrid plasmid 
(designated pCAR25) was constructed by inserting a 1 - 6503 fragment which was obtained by deletion 
from the Hindlll site to nucleotide position 6504, into pUC19. E. coli JM109 harboring pCAR25 (referred to 
hereinafter as E. coli (pCAR25)] exhibited yellow and produced The same yellow pigments as those of E. 
uredovora . TheTefore, it was thought that the region from the base number 6504 to 6918 in Rg. 7 was not 
required for yellow pigment production. The nucleotide sequence in the region from the base number 1 to 
6503 in the 6918 bp DNA sequence containing the yellow pigment-synthesizing genes was analyzed. As a 
result it was found that there were six open reading frames (ORFs). That is to say, there were an ORF 
coding for a polypeptide with a molecular weight of 32,583 from the base number 225 to 1130 (referred to 

35 as 0RF1. which corresponds to A - B in Rgs. 1 and 7). an ORF coding for a polypeptide with a molecular 
weight of 47,241 from the base number 1143 - 2435 (referred to as 0RF2, which corresponds to C - D in 
Figs. 2 and 7), an ORF coding for a polypeptide with a molecular weight of 43,047 from the base number 
2422 to 3567 (referred to as 0RF3. which corresponds to E - F in Figs. 3 and 7), an ORF coding for a 
polypeptide with a molecular weight of 55.007 from the base number 3582 to 5057 (referred to as 0RF4. 

40 which corresponds to G - H rn Figs. 4 and 7), an ORF coding for a polypeptide with a molecular weight of 
33,050 from the base number 5096 to 5983 (referred to as 0RF5. which corresponds to 1 - J in Rgs. 5 and 
7). and an ORF coding for a polypeptide with a molecular weight of 19,816 from the base number 6452 to 
5928 (referred to as 0RF6, which corresponds to K - L in Figs. 6 and 7. Only this 0RF6 has the opposite 
orientation with the others). In this connection, each ORF contained at positions several base-upstream from 

45 its initiation codon the SO (Shine-Oalgarno) sequence which is homologous with the 3'-region of 16S 
ribosomal RNA of E. coli. Thus, it was thought that polypeptides were in fact synthesized in E. coli by these 
six ORFs. This wasconfirmed by the following in vitro transcription-translation experiment 

That is to say. the in vitro transcription-translation analysis was carried out with DNA in which the 
plasmid pCAR25 containing ORF1 - 0RF6 had been digested with Sea l and with DNAs in which respective 

50 fragments containing respective ORFs (containing the SD sequence) of ORF1 - 0RF6 had been digested 
with appropriate restriction enzymes, isolated, inserted into pUCl9 or pUC18 so that it was subjected to 
transcriptional read-through from a lac promoter, and then digested with Sea l. In this experiment, a 
Prokaryotic DNA-directed translation kit manufactured by Amersham was used. As a result it was confirmed 
that the bands of polypeptides corresponding to the aforementioned respective ORFs were detected as the 

55 transcription-translation products. . 

Moreover, all of six ORFs were necessaiy for .production of the same yellow pigments as those of E. 
uredovora as described below (Experimental Examples 3, 4 and 5). From these results, 0RF1, 0RF2, 
0RF3, ORF4, ORFS and 0RF6 were designated as zexA. zexB, zexC, zexD. zexE and zexF genes. 
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respectively. 

The base numbers in Rgs. 1 - 6 were represented on the basis of the Kpnl site in Rg. 7 as the base 
nunnber 1 and correspond to each other. The marks A - L in Rgs. 1 - 6 correspond to the marks A - L in 
Rg. 7. The DNA sequence from K to L In Rg. 6 was that of the complementary strand of the DNA 
5 sequence from K to L in Rg. 7. That is to say. the DNA sequence Illustrated in Rg. 6 has the opposite 
orientation in transcription with the DNA sequences in Rgs. 1 - 5 in the original DNA sequence (Rg. 7). 

(3) Analysis of homology by the DNA-DNA hybridization method 

fo 

Total DNA of Erwinia herbicola Eho 10 ATCC 39368 was prepared in the same manner as In 
Experimental Example 1 (1). A 7.6 kb fragment containing the DNA sequence in Rg, 7 was cut out from the. 
hybrid plasmtd pCARl by Kpn l digestion and labeled with DNA labeling & detection kit nonradioactive 
(manufactured by Boehringer-Mannheim) according to the OIG-ELISA method to give probe DNA. The 

75 homology of total DNAs (intact or Kpn l digested) of E. herbicola Eho 10 ATCC 39368 and E. uredovora 
20O3 ATCC 19321 with this probe DNA was analyzed by the DNA-DNA hybridization method with the 
aforementioned DNA labeling & detection kit nonradioactive. As a result, the probe DNA was hybridized 
strongly with total DNA of the latter E. uredovora 20D3 ATCC 19321. but not at all with total DNA of the 
former E. herbicola Eho 10 ATCC 39368. Also, the restriction map deduced from the DNA sequence in Rg. 

20 7 was quite different from that reported in J. Bacteriol.. 168. 607-612 (1986). It was concluded from the 
above described results that the DNA sequence In Rg, 7. that is. the DNA sequences useful for the 
synthesis of carotenoids according to the present invention exhibits no homology with the DNA sequence 
containing the yellow pigment-synthesizing genes of E. herbicola Eho 10 ATCC 39368. 

25 

Experimental Example 3: Analysis of yellow pigments 

E. coli (pCAR25) produced the same yellow pigments as those of E. uredovora 20D3 ATCC 19321 and 
E. herbicola Eho 10 ATCC 39368. and its yield was 5 times higher than those of the former and 6 times 

30 higher than those of the latter (per dry weight). The cells harvested from 8 liters of 2 x YT medium (1 .6% 
tryptone, 1% yeast extract, 0.5% NaCI) were extracted once with 1.2 liter of methanol. The methanol extract 
was evaporated to dryness, dissolved in methanol, and subjected to thin layer chromatography (TLC) with 
silica gel 60 (Merck) (developed with chloroform : methanol = 4:1). The yellow pigments were separated 
into 3 spots having Rf values of 0.93. 0.62 and 0.30 by TLC. The yellow (to orange) pigment at the Rf value 

OS of 0.30 which was the strongest spot was scraped up from the TLC plate, extracted with a small amount of 
methanol, loaded on a Sephadex LH-20 column for chromatography [30 cm x 3.0 cm (0)] and developed 
and eluted with methanol to give 4 mg of a pure product. The yellow (to orange) pigment obtained was 
sparingly soluble in organic solvents other than methanol and easily soluble in water, so that it was 
suggested that It might be a carotenoid glycoside. Such suggestion was also supported from a molecular 

40 weight of 892 by FD-MS spectrum (the mass of this pigment was larger than that of zeaxanthin (described 
hereinafter) by the mass of two glucose). When this substance was hydrolyzed with IN HCI at 100* C for 10 
minutes, zeaxanthin was obtained. Then, acetylation was conducted according to the usual method. That is. 
the substance was dissolved in 10 ml of pyridine, large excess of acetic anhydride was added, and the 
mixture was stirred at room temperature and left standing overnight. After the completion of reaction, water 

45 was added to the mixture and chloroform extraction was carried out. The chloroform extract was con- 
centrated and loaded on a silica gel column (30 cm x 3.0 cm (0)] for chromatography to develop and elute 
with chloroform. Measurement of ^H-NMR gave the spectrum identical with the tetraacetyl derivative of 
zeaxanthin-i3-diglucoside (Helvetica Chimica Acta. 57. 1641-1651 (1974)], so that the substance was 
identified as zeaxanthin-;3-diglucoside (its structure being illustrated below). 

50 The yield was 1.1 mg/g dry weight. The substance had a solubility of at least 2 mg in 100 ml of water 

and methanol, and water was superior to methanol in solubility of the substance. The substance had low 
solubilities in chloroform and acetone, and its solubilities were 0.5 mg in 100 ml of these solvents. 



55 
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Experimental Example 4: Analysis of the metabolic intermediates of carotenolds 
(1) Construction of various deletion plasmids 



A hybrid plasmid (designated as pCARlS) was constructed by inserting a 1-6009 fragment which was 
obtained by deletion to nucleotide position 6010 from the Hind I II site (right terminal in Rg. 7) of the 6918 bp 
fragment containing the yellow pigment-synthesizing genes (DNA strand) (Fig. 7) using Deletion kit for kilo- 
sequence. pCAR16 contains the genes from zexA to zexE . Various deletion plasmids were constructed, as 
25 shown in Table 1. on the basis of the pCAR16 and the aforementioned hybrid plasmid pCAR25 (containing 
genes from zexA to zexF). 



Table 1 : Construction of various deletion plasmids 



The number within parentheses behind the name of respective restriction enzymes represents the 
number of base at the initial recognition site of the restriction enzyme. The base numbers correspond to 
those in Rgs. 1 - 6 and Fig. 7. Analysis of the metabolic intermediates of carotenoids was performed using 
the transformants of E. coli JM109 by various deletion plasmids [referred to hereinafter as E. coli (name of 



35 plasmid)]. 



40 
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Table 1 



Plasmid 


Constatction methcxj 


Genes functioning 


pCAR25 


See text 


zexA zexB zexC 
zexD zexE zexF 


pCAR25delB 


Frame shift in BstEH (1235) of 
pCAR25 


zexA zexC zexD 
zexE zexF 


pCARie 


See text 


zexA zexB zpi^r! 

^WnX^ ^W/xUJ 

zexD zexE 


pCAR16delB 


Frame shift in BstEII (1235) of 
pCARie 


zexA zexC zexD 


pCARl6delC 


Frame shift in SnaBI (3497) of 
pCAR16 " 


zexA zexB zexD 


pCAR-ADE 


Deletion of the BstEII (1235) - 
SnaBI (3497) fragment from 
DCAR16 


zexA zexD zexE 


pCAR-ADEF 


Deletion of the BstEII (1235) - 
SnaBI ^3497i fraompnt frnm 
pCAR25 


zexA zexO zexE 
zexF 


pCAR25delD 


Frame shift in BamHI nf 
PCAR25 


zexM zexD zexo 
zexE zexF 


pCAR-AE 


Deletion of the BstPIl - 
BamHI (3652) fragment from 
pCARl6 




pCAR-A 


Insertion of the Kpnl (1) - BstEII 
(1235) fragnient in pUC19 


zexA 


pCAR-E 


Insertion of the Eco52I (4926) - 
6009 fragment in pUCl9 


zexE 


pCAR25delE 


Frame shift in Mlul (5379) of 
pCAR25 


zexA zexB zexC 
zexD zexF 


-pCAR25delA 


Frame shift in Aval (995) of pCAR25 


zexB zexC zexD 
zexE zexF 


pCAR-CDE 


Insertion of the Sail (2295) - 6009 
fragment in pUCl9 


zexC zexD zexE 



45 

(2) Identification of zeaxanthin 

The cells harvested from 3 liters of 2 x YT medium of E. coH (pCAR25delB) (exhibiting orange) were 
50 extracted twice with 400 ml portions of acetone at low temperature, concentrated, then extracted with 
chloroformrmethanol (9:1) and evaporated to dryness. This was subjected to silica gel column chromatog- 
raphy (30 cm X 3.0 cm (0)]. After the column was washed with chloroform, an orange band was eluted with 
chloroformrmethanol (100:1). This pigment was dissolved in ethanol. recrystaliized at low temperature to 
give 8 mg of a pure product. The analysis by its UV-visible absorption. »H-NMR. '^C-NMR and FD-MS (m/e 
55 568) spectra revealed that this substance had the same structure except for stereochemistry as zeaxanthin 
(^.^-carotene-3,3 -diol). It was then dissolved in diethyl ether : isopentane ethanol (5:5:2). and the CD 
spectrum was measured. As a result, it was found that this substance had a 3R.3'R-stereochemistry 
[Phytochemistry. 27, 3605-3609 (1988)]. Therefore, it was identified as zeaxanthin ()S.^-carotene-3R.3'R- 



15 




(3) Identification of /3-carotene 

75 

The cells harvested from 3 liters of LB medium of E. coji (pCAR16) (exhibiting orange) were extracted 3 
times with 500 ml portions of cold methanol- at low temperature and the methanol extract was further 
extracted with 1.5 liter of hexane. The hexane layer was concentrated and subjected to silica gel column 
chromatography [30 cm x 3.0 cm (0)]. Development and elution were conducted with hexaneiethyl acetate 

20 (50:1) to collect an orange band. The orange fraction was concentrated and recrystallized from ethanol to 
give 8 mg (reduced weight without moisture). This substance was presumed to belong to iS-carotene from 
its UV-visible absorption spectrum, and a molecular weight of 536 by FD-MS spectrum also supported this 
presumption. Upon comparing this substance with the authentic sample (Sigma) of j3-carotene by '^c-NMR 
spectrum, ail of chemical shifts of carbons were identical with each other. Thus, this substance was 

25 identified as jS-carotene (all-trans- jS./S-carotene, of which the structure was illustrated below). It was also 
confirmed by the similar method that E. coli (pCAR16deiB) accumulated the same jS-carotene as described 
above. Its yield was 2.0 mg/g dry weight* which corresponded to 2 -8 times (per dry weight) of the total 
carotenoid yield in carrot (Kintokininjin) culture cells described in Soshikibaiyou (The Tissue Culture), 13, 
379-382 (1987). 

30 . 



35 




(4) Identification of lycopene 

The cells harvested from 3 liters of LB medium of E. coli (pCARISdeIC) (exhibiting red) were extracted 
once with 500 ml of cold methanol at low temperature, and the precipitate by centrifugation was extracted 
again with 1.5 liter of chloroform. The chloroform layer was concentrated and subjected to silica gel 
chromatography [30 cm x 3.0 cm (0)]. Development and elution were conducted with hexanexhioroform 
(1:1) to collect a red band. This fraction was concentrated. This substance was presumed to belong to 
lycopene from its UV-visibie absorption spectrum, and a molecular weight of 536 by FD- MS spectrum also 
supported this presumption. Upon comparing this substance with the authentic sample (Sigma) of lycopene 
by 'H-NMR spectrum, all of chemical shifts of hydrogens were identical with each other. When, this 
substance and the authentic sample were subjected to TLC with silica gel 60 (Merck) [developed with 
hexane:chIoroform (50:1)] and with RP-18 (developed with methanohchloroform (4:1)], the displacement 
distances of. these samples were completely equal to each other. Thus this substance was identified as 
lycopene (all- trans - jg.g-carotene. of which the structure was illustrated below). It was also confirmed by the 
similar method that E. coli (pCAR-ADE) and E. coli (pCAR-AOEF) accumulated the same lycopene as 
described above. The"yield of the former was 2.0 mg/g dry weight, which corresponded to 2 limes (per dry 
weight) of the total carotenoid yield in a hyperproduction derivative of carrot (Kintokininjin) culture cells 
described in Soshikibaiyou (The Tissue Culture), 13, 379-382 (1987). 
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5 



(5) Identification of phytoene 



w 



The cells harvested from 1 .5 liter of 2 x YT medium of E. coH (pCAR-AE) were extracted twice with 200 
ml portions of acetone and twice with 100 ml portions of hexane. and evaporated to dryness. This was 
subjected to silica gel chromatography [30 cm x 3.0 cm (0)]. Development and elution were conducted 
with hexanexhioroform (1:1) to collect a band which had a strong UV absorption, and it was confimned to be 

,5 phytoene by its UV absorption spectrum. It was further subjected to LH-20 column chromatography [30 cm 
X 3.0 cm (0)]. Development and elution were conducted with chloroform :methanol (1:1) to give 4 mg of a 
pure product The comparison of the ^H-NMR spectrum of this substance with the 'H-NMR spectra of trans- 
and cis-phytoen (J. Magnetic Resonance, 10, 43-50 (1973)) showed this substance to be a mixture of the 
trans - and cis-isomers. Isomerization from cis-isomer to trans-isomer hardly occurs, and thus it was judged 

20 that such a mixture was produced as a result of trans -cis isomerization in the course of the purification. 
Therefore, it was concluded that the original phytoene was the trans -type phytoene (all-trans-phytoene, of 
which the structure was shown below), it was also confirmed by the similar method that E. coli - 
(pCAR25delD) accumulated the same phytoene as described above. ~" 



From the facts that E. coli {pCAR25) produced zeaxanthin-diglucoside and that E. coli (pCAR25delB) 
harboring a plasmid, in which zexB had been removed from pCAR25, accumulated zeaxaTithin, it was found 
that the zexB gene encoded the glycosylation enzyme which was capable of converting zeaxanthin into 
zeaxanthin-diglucoside. Similarly, from the fact that E. coli (pCARl6delB) harboring a plasmid, in which 
zexF had been removed from pCAR25delb. accumulated )3-carotene. it was found that the zexF gene 
encoded the hydroxylation enzyme which was capable of converting )3-carotene into zeaxanthin. Similarly, 
from the fact that the E. coli (pCAR-ADE) harboring a plasmid, in which zexC had been removed from 
pCAR16delB. accumulated lycopene, it was found that the zexC gene encoded the cyclization enzyme 
which was capable of converting lycopene into ^-carotene. Also, E. coli (pCAR-AD£F) carrying both of the 
zexA , z exD and zexE genes required for producing lycopene and the"zexF gene encoding the hydroxylation 
enzyme was able to synthesize only lycopene.- This demonstrates directly that the hydroxylation reaction in 
carotenoid biosynthesis occurs after the cyclization reaction. Further, from the facts that E. coli (pCAR-ADE) 
accumulated lycopene and that E. coH (pCAR-AE) harboring a plasmid, in which the ze"xb~gene had been 
removed from pCAR-ADE. accumulated phytoene, it was found that the zexD gene encoded the de- 
saturation enzyme which was capable of converting phytoene into lycopene. E. coli (pCAR-A) and E. coli - 
(pCAR-E) were not able to produce phytoene. It was thought from this result thatToth of the zexA and zexE 
genes were required for producing phytoene in E. coli. zexE and zexA were identified as the gene for the 
conversion of geranylgeranyl pyrophosphate into prephytoene pyrophosphate and that for the conversion of 
prephytoene pyrophosphate into phytoene. by comparing their putative amino acid sequence with those of 
crtB and crtE gene products in a photo synthetic bacterium Rhodobacter capsuratus [Mol. Gen. Genet., 
216, 254-268 (1988)]. From these analyses described above, all of the six zex genes have been identified 
and the biosynthetic pathway of carotenoids have also been clear. These results are listed in Rg. 8. 

E. coli (pCAR25delE) accumulated no detectable carotenoid intermediate, while E. coli (pCAR25delA) 
and E. cpM (pCAR-CDE) were able to produce a small amount of carotenoids. ThaT"is to say, E. coli - 
(pCAR25delA) and E. coU (pCAR-CDE) produced 4% of zeaxanthin-diglucoside and 2% of )3-carotene"as 
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Experimental Example 5: Identification of carotenoid biosynthesis genes 
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compared with the E. coH (pCAR25) and the E. coli (pCAR16deib). respectively. This result suggests that 
the reaction from prephytoene pyrophosphate to phytoene may occur non-enzymatlcally notwithstanding 
the yield being trace. 

As described above, the detailed biosyn.thetic pathway of carotenoids including general and famous 
5 carotenoids such as lycopene. )9-carotene and zeaxanthin and water soluble carotenoid such as zeaxanthin- 
diglucostde were for the first time elucidated, and the gene cluster useful for these biosynthesis was 
capable of being acquired for the first time. In this connection, lycopene. )9-carotene and zeaxanthin which 
were produced by the genes in the aforementioned Experimental Examples were stereochemically identical 
with those derived from higher plants [T.W. Goodwin; "Plant Pigments". Academic Press (1988)], 
10 As for zeaxanthin-diglucoside. the isolation from a plant was only reported [Pure & Appl. Chem., 47. 
121-128 (1976)], but its isolation from microorganisms has not been reported. 



Experimental Example 6: Synthesis of carotenoids in Zyrnomonas 

/5 

Zyrnomonas mobilis is a facultative anaerobic ethanol-producing bacterium. It has a higher ethanoi 
producing rate than that of yeast ( Saccharomyces cerevisiae ). so that it is preferable as a fuel alcohol- 
producing bacterium in future. Also. Zyrnomonas has a special metabolic pathway, Entner-Ooudoroff but not 
glycolytic pathway and cannot produce carotenoids. In order to add further values to this bacterium, the 

20 carotenoid biosynthesis genes were introduced into Zymomonas . 

The 7.6 kb fragment containing the DNA sequence shown in Fig. 7 was cut out from the hybrid plasmid 
pCARI by Kpnl digestion and treated with DNA polymerase I (Kienow enzyme). The fragment thus treated 
was ligated to the Eco RV site of the cloning vector pZA22 for Zymomonas [see Agric. Biol. Chem., 50, 
3201-3203 (1986) and Japanese Patent Laid-Open Publication No. 228278/87] to construct a hybrid plasmid 

25 pZACARI. Also, the 1 -6009 fragment in the DNA sequence in Fig. 7 was cut out from pCAR16 by 
Kpn I/ Eco RI digestion and treated with DNA polymerase I (Kienow enzyme). The fragment thus treated was 
ligated to the EcoRV site of pZA22 to construct a hybrid plasmid pZACAR16. The orientation of the inserted 
fragments in these plasmids were opposite with the orientation of the Tc' gene on taking the orientation in 
Fig. 7 as the normal one. These plasmids were introduced into Z. mobilis NRRL B-14023 by conjugal 

30 transfer with the helper plasmid pRK2013 (ATCC 37159) and statDly maintained in this strain. Z. mobilis 
NRRL 8-14023 in which pZACARI and pZACAR16 had been introduced exhibited yellow, and "produced 
zeaxanthin-diglucoside in an amount of 0.28 mg/g dry weight and )3-carotene in an amount of 0.14 mg/g dry 
weight, respectively. Therefore, carotenoids were successfully produced in Zymomonas by the carotenoid 
biosynthesis genes according to the present invention. 

35 

Deposition of Microorganism 

Microorganism relating to the present invention is deposited at Fermentation Research Institute, Japan 
40 as follows: 



Microorganism 


Accession number 


Date of deposit 


Escherichia coli JM109 (pCARI) 


PERM BP 2377 


April 11, 1989 



Claims 

50 

1 . A DNA sequence encoding a polypeptide which has an enzymatic activity for converting prephytoene 
pyrophosphate into phytoene and whose amino acid sequence corresponds substantially to the amino acid 
sequence from A to B shown in Figs. 1 (a) and (b). 

2. A DNA sequence encoding a polypeptide which has an enzymatic activity for converting zeaxanthin 
into zeaxanthin-diglucoside and whose amino acid sequence corresponds substantially to the amino acid 
sequence from C to D shown in Rgs. 2 (a) and (b). • 

3. A DNA sequence encoding a polypeptide which has an enzymatic activity for converting lycopene 
into ^-carotene and whose amino acid sequence corresponds substantially to the amino acid sequence 
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from e to F shown in Rgs. 3 (a) and (b). 

4. A DNA sequence encoding a polypeptide which has an enzymatic activity for converting phytoene 
into lycopene and whose amino acid sequence corresponds substantially to the amino acid sequence from 
G to H shown in Rgs, 4 (a), (b) and (c). 

5. A DNA sequence -encoding a polypeptide which has an enzymatic activity for converting geranyl- 
geranyi pyrophosphate into prephytoene pyrophosphate and whose amino acid sequence corresponds 
substantially to the amino acid sequence from I to J shown in Rgs. 5 (a) and (b). 

6. A DNA sequence encoding a polypeptide which has an enzymatic activity for converting j8-carotene 
into zeaxanthin and whose amino acid sequence corresponds substantially to the amino acid sequence from 
K to L shown in Rg. 6. 

7. A process for producing a carotenoid compound which is selected from the group consisting of 
prephytoene pyrophosphate, phytoene. lycopene, ^-carotene, zeaxanthin and zeaxanthin-digiucoside, which 
comprises transforming a host with at least one of the DNA sequences according to claims 1-6. and 
cuituring the transformant 
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230 240 250 260 270 280 

ATGACGGTCTGCGCAAAAAAACACGTTCATCTCACTCGCGATGCTGCGCAGCAGTTACTG 
^MetThrValCysAlaLysLysHisValHisLeuThrArgAspAlaAlaGluGl nLeuLeu 

A 

290 300 310 320 330 340 

GCTGATATTGATCGACGCCTTGATCAGTTATTGCCCGTGGAGGGAGAACGGGATGTTGTG 
Al aAsp 1 1 eAspAr^ArgLeuAspGl nLeuLeuProVa 1 Gl uGl yGl uArsAspVal Va 1 

350 360 370 3S0 390 400 

GGTGCCGCGATGCGTGAAGGTGCGCTGGCACCGGGAAAACGTATTCGCCCCATGTTGCTG 
GlyAl aAl aMe lArgGluGlyAlaLeuAlaProGlyLysArgl 1 eAr^ProMe tLeuLeu 

410 420 430 440 450 460 

TTGCTGACCGCCCGCGATCTGGGTTGCGCTGTCAGCCATGACGGATTACTGGATTTGGCC 
LeuLeuThrAIaArgAspLeuGlyCysAI aValSerHi sAspGl yLeuLeuAspLeuAl a 

470 480 490 500 510 520 

TGTGCGGTGGAAATGGTCCACGCGGCTTCGCTGATCCTTGACGATATGCGCTGCATGGAC 
CysAlaValGluMetValHisAIaAlaSerLeuIleLeuAspAspMetProCysMe tAsp 



530 540 550 560 570 580 

GATGCGAAGCTGCGGCGCGGACGCCCTACCATTCATTCTCATTACGGAGAGCATGTGGCA 
AspAl aLysLeuArgAr^Gl yArgProThr 1 1 eHi sSerHi sTyrG 1 yG 1 uH i s Ya I A 1 a 



590 600 610 620 630 640 

ATACTGGCGGCGGTTGCCTTGCTGAGTAAAGCCTTTGGCGTAATTGCCGATGCAGATGGC 
1 1 eLeuAl aAl aVal Al aLeuLeuSerLysAl aPheGl yVal 1 1 eAl aAspAl aAspGl y 



650 660 670 680 690 700 

CTCACGCCGCTGGCAAAAAATCGGGCGGTTTCTGAACTGTCAAACGCCATCGGCATGCAA 
LeuThrProLeuAlaLysAsnArgAl aVal SerGluLeuSerAsnAl al 1 eGlyMe tGl n 



710 720 730 740 750 760 

GGATTGGTTCAGGGTCAGTTCAAGGATCTGTCTGAAGGGGATAAGCCGCGCAGCGCTGAA 
GI yLeuValGl nGlyGl nPheLysAspLeuSerGl uGlyAspLysProArgSerAl aGl u 



770 780 790 800 810 820 

GCTATTTTGATGACGAATCACTTTAAAACCAGCACGCTGTTTTGTGCCTCCATGCAGATG 
Al al I eLeuMe tthrAsnHl sPheLysThrSerThrLeuPheCysAl aSerMe tGl nMe t 



830 840 850 860 870 880 

GCCTCGATTGTTGCGAATGCCTCCAGCGAAGCGCGTGATTGCCTGCATCGTTTTTCACTT 
AlaSer II eVal AlaAsnAl aSerSerGIuAlaArsAspCysLeuHisArgPheSerLeu 



FIG. I (a) 
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890 900 910 920 930 940 

GATCTTGGTCAGGCATTTCAACTGCTGGACGATTTGACCGATGGCATGACCGACACCGGT 
AspLeuGl yGl nAl aPheGl nLeuLeuAspAspLeuThrAspGl yMe tThrAspThrGl y 



950 960 970 980 990 1000 

AAGGATAGCAATCAGGACGCCGGTAAATCGACGCTGGTCAATCTGTTAGGCCCGAGGGCG 
LysAspSerAsnGl nAspAl aGlyLysSerThrLeuValAsnLeuLeuGlyProArgAl a 



1010 1020 1030 1040 1050 1060 

GTTGAAGAACGTCTGAGACAACATCTTCAGCTTGCCAGTGAGCATCTCTCTGCGGCCTGC 
ValGlTiGluAr^LeuArgGlnHisLeuGlnLeuAlaSerGluHisLeuSerAUAlaCys 



1070 1080 1090 1100 1110 1120 

CAACACGGGCACGCCACTCAACATTTTATTCAGGCCTGGTTTGACAAAAAACTCGCTGCC 
• GI nHi sGl yHi s Al aThrGI nHlsPhe 1 1 eGl nAl aTrpPheAspLysLysLeuA 1 aA 1 a 



1130 
GTCAGTTAA 
Val Ser*** 



f 

B 
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1150 1160 1170 1180 1190 1200 

ATGAGCCATTTCGCGGCGATCGCACCGCCTTTTTACAGCCATGTTCGCGCATTACAGAAT 
Me tSerHisPheAlaAl al 1 eAl aProProPheTyrSerHi sVal ArgAl aLeuGl nAsn 



1210 1220 1230 1240 1250 1260 

CTCGCTCAGGAACTGGTCGCGCGCGGTCATCGGGTGACCTTTATTCAGCAATACGATATT 
LeuAlaGlnGluLeuValAlaArgGlyHisArgValThrPhel leGlnGlnTyrAspIle 



1270 1280 1290 1300 1310 1320 

AAACACTTGATCGATAGCGAAACCATTGGATTTCATTCCGTCGGGACAGACAGCCATCCC 
LysHi sLeuIl eAspScrGl uThr 1 1 eGl yPheHisSerVal Gl yThrAspSerHi sPro 



1330 1340 1350 1360 1370 1380 

CCCGGCGCGTTAACGCGCGTGCTACACCTGGCGGCTCATCCTCTGGGGCCGTCAATGCTG 
ProGlyAlaLeuThrArsYalLeuHisLeuAl aAl aHi sProLeuGI yProSerMe tLeu 



1390 1400 1410 1420 1430 1440 

AAGCTCATCAATGAAATGGCGCGCACCACCGATATGCTGTGCCGCGAACTCCCCCAGGCA 
LysLeuI 1 eAsnGluMe tAl aAr«ThrThrAspMetLeuCysAr«Gl uLeuProGInAl a 



1450 1460 1470 1480 1490 1500 

TTTAACGATCTGGCCGTCGATGGCGTCATTGTTGATCAAATGGAACCGGCAGGCGCGCTC 
PheAsnAspLeuAlaValAspGlyVal IleValAspGInMetGluProAlaGlyAlaLeu 



1510 1520 1530 1540 1550 1560 

GTTGCTGAAGCACTGGGACTGCCGTTTATCTCTGTCGCCTGCGCGCTGCCTCTCAATCGT 
Yal AlaGluAlaLeuGlyLeuProPhel leSerVal AlaCysAl aLeuProLeuAsnArg 



1570 1580 1590 1600 1610 1620 

GAACCGGATATGCCCCTGGCGGTTATGCCTTTCGAATACGGGACCAGCGACGCGGCTCGC 
GluProAspMe tProLeuAlaValMe tProPheGl uTyrGl yThrSerAspAl aA 1 aArg 



1630 1640 1650 1660 1670 1680 

GAACGTTATGCCGCCAGTGAAAAAATTTATGACTGGCTAATGCGTCGTCATGACCGTGTC 
GluArgTyrAlaAlaSerGIuLysI 1 eTyrAspTrpLeuMe t ArgArgHl sAspArgVal 



1690 1700 1710 " 1720 1730 1740 

ATTGCCGAACACAGCCACAGAATGGGCTTAGCCCCCCGGCAAAAGCTTCACCAGTGTTTT 
r leAlaGluHisSerHisArgMetGlyLeuAl aProArgGlnLysLeuHlsGlnCysPhe 



1750 1760 1770 1780 1790 1800 

TCGCCACTGGCGCAAATCAGCCAGCTTGTTCCTGAACTGGATTTTCCCCGCAAAGCGTTA 
SerProLeuAlaGlnl I eSerGl nLeuVal ProGI uLeuAspPheProArgLysAl aLeu 



F I G. 2 (a) 
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1810 1820 1830 1840 18S0 1860 

CCGGCTTGTTTTCATGCCGTCGGGCCTCTGCGCGAAACGCACGCACCGTCAACGTCTTCA 
ProAl aCysPheHI sAI aValGIyProLcuAr^GluThrHisAlaProSerThrSerSer 



1870 1880 1890 1900 1910 1920 

TCCCGTTATTTTACATCCTCAGAAAAACCCCGGATTTTCGCCTCGCTGGGCACGCTTCAG 
SerArgTyrPheThrSerSerGluLysProAr^I 1 ePheAl aSerLeuGl yThrLeuG 1 n 



1930 1940 1950 1960 1970 1980 

GGACACCGTTATGGGCTGTTTAAAACGATAGTGAAAGCCTGTGAAGAAATTGACGGTCAG 
G 1 y H I s Ar gTy rG 1 yLeuPh eLy s Th r II e V a 1 Ly s A 1 aCy s G 1 uG 1 u 1 1 e As pG 1 y G 1 n 



1990 2000 2010 2020 2030 2040 

CTCCTGTTAGCCCACTGTGGTCGTCTTACGGACTCTCAGTGTGAAGAGCTGGCGCGAAGC 
LeuLeuLeuAl aHIsCysGlyAr^LeuThrAspSerGlnCysGluGluLeuAlaArgSer 



2050 2060 2070 2080 2090 2100 

CGTCATACACAGGTGGTGGATTTTGCCGATCAGTCAGCCGCGCTGTCTCAGGCGCAGCTG 
Ar«Hi sThrGl nVal Val AspPheAl aAspGl nS«rAl aAl aLeuSerGl nAl aGl nLeu 



2110 2120 2130 2140 2150 2160 

GCGATCACCCACGGCGGCATGAATACGGTACTGGACGCGATTAATTACCGGACGCCCCTT 
Al all cThrHisGlyGl yMe tAsnThrValLeuAspAlal 1 eAsnTyrArgThrProLeu 



2170 2180 2190 2200 2210 2220 

TTAGCGCTTCCGCTGGCCTTTGATCAGCCCGGCGTCGCGTCACGCATCGTTTATCACGGC 
LeuAl aLeuProLeuAI aPheAspGl nProGlyVal Al aSerArgl 1 eValTyrHi sGl y 



2230 2240 2250 2260 2270 2280 

ATCGGCAAGCGTGCTTCCCGCTTTACCACCAGCCATGCTTTGGCTCGTCAGATGCGTTCA 
1 1 cGlyLysArgAlaSerArsPheThrThrSerHisAlaLeuAlaArgGlnMe tArgSer 



2290 2300 2310 2320 2330 2340 

TTGCTGACCAACGTCGACTTTCAGCAGCGCATGGCGAAAATCCAGACAGCCCTTCGTTTG 
LeuLeuThrAsnVal AspPheGlnGl nArgMe tAlaLys 1 1 eCl nThrAl aLeuArgLeu 



2350 2360 2370 2380 2390 2400 

GCAGGGGGCACCATGGCCGCTGCCGATATCATTGAGCAGGTTATGTGCACCGGTCAGCCT 
AlaGlyGlyThrMetAlaAlaAlaAspI 1 el leGluGlnValMetCysThrGlyGl nPro 

D 

2410 2420 2430 i 

GTCTTAAGIGGGAGCGGCTATGCAACCGCATTATGA 
ValLeuSerGI ySerGI yTyrAl aThrAl aLeu*** 



Fl G. 2 (b) 
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2430 2440 2450 2460 2470 24-80 

ATGCAACCGCATTATGATCTGATTCTCGTGGGGGCTGGACTCGCGAATGGCCTTATCGCC 
.MetGlnProHisTyrAspLeuI leLeuValGlyAlaGlyLeuAl aAsnGlyLeuI leAl a 



2490 2500 . 2510 2S20 2530 2540 

CTGCGTCTTCAGCAGCAGCAACCTGATATGCGTATTTTGCTTATCGACGCCGCACCCCAG 
LeuArsLeuGl nGl nGl nGl nProAspMe t Ar« 1 1 eLeuLeuI I eAspAkaAI aProGl n 

.2550 2560 2570 2580 2590 2600 

GCGGGCGGGAATCATACGTGGTCATTTCACCACGATGATTTGACTGAGAGCCAACATCGT 
AlaG lyGlyAsnHisThrTrpSerPheHisHisAspAspLeuThrGluSerGlnHisAr^ 



2610 2620 2630 2640 2650 2660 

TGGATAGCTCCGCTGGTGGTTCATCACTGGCCCGACTATCAGGTACGCTTTCCCACACGC 
TrpI leAlaProLeuVal ValHisHlsTrpProAspTyrGlnValArgPheProThrArg 



2670 2680 2590 2700 2710 2720 

CGTCGTAAGCTGAACAGCGGCTACTTTTGTATTACTTCTCAGCGTTTCGCTGAGGTTTTA 
Ar^ArgLysLeuAsnSerG 1 yTyrPheCys 1 1 eThrSerGI nATffPheA 1 aGl uVal Leu 



2730 2740 2750 2760 2770 2780 

CAGCGACAGTTTGGCCCGCACTTGTGGATGGATACCGCGGTCGCAGAGGTTAATGCGGAA 
GlnArgGInPheGlyProHlsLeuTrpMe tAspThrAlaValAlaGluYal AsnAl aGlu 



2790 2800 2810 2820 2830 2840 

TCTGTTCGGTTGAAAAAGGGTCAGGTTATCGGTGCCCGCGCGGTGATTGACGGGCGGGGT 
SerVal ArsLeuLysLysGlyGl nYal I leGlyAl aArgAlaVal 1 1 eAspGl yArgGl y 



2850 2860 2870 2880 2890 2900 

TATGCGGCAAATTCAGCACTGAGCGTGGGCTTCCAGGCGTTTATTGGCCAGGAATGGCGA 
TyrAlaAlaAsnSerAlaLeuSerValGlyPheGlnAlaPhel leGIyGlnGluTrpArg 



2910 2920 2930 2940 2950 2960 

TTGAGCCACCCGCATGGTTTATCGTCTCCCATTATCATGGATGCCACGGTCGATCAGCAA 
LeuSerHisProHisGlyLeuSerSerProIlelleMetAspAlaThrValAspGlnGl n 



2970 2980 2990 3000 3010 3020 

AATGGTTATCGCTTCGTGTACAGCCTGCCGCTCTCGCCGACCAGATTGTTAATTGAAGAC 
AsnGlyTyrArgPheValTyrSerLeuProLeuSerProThrArgLeuLeuI 1 eGl uAsp 



3030 3040 3050 3060 3070 3080 

ACGCACTATATTGATAATGCGACATTAGATCCTGAATGCGCGCGGCAAAATATTTGCGAC 
ThrHlsTyrl 1 eAspAsnAl aThrLeuAspProGl uCys Al aArgGl nAsn I leCysAsp 



FIG. 3 (a) 
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-3090 3100 3110 3120 3130 3140 

TATGCCGCGCAACAGGGTTGGCAGCTTCAGACACTGCTGCGAGAAGAACAGGGCGCCTTA 
TyrAlaAlaGlnGlnGlyTrpGlnLeuGlnThrLeuLeuArgGluGluGlnGlyAlaLeu 



3150 3160 3170 3180 3190 3200 

CCCATTACTCTGTCGGGCAATGCCGACGCATTCTGGCAGCAGCGCCCCCTGGCCTGTAGT 
Pro 1 1 eThrLeuSerG-l yAsnAl aAspAl aPheTrpGl nGl nArsProLeuAI aCysSer 



3210 3220 3230 3240 3250 3260 

GGATTACGTGCCGGTCTGTTCCATCCTACCACCGGCTATTCACTGCCGCTGGCGGTTGCC 
GlyLeuAr^AlaGlyLeuPheHisProThrThrGIyTyrSerLeuProLeuAlaValAl a 



3270 3280 3290 3300 3310 3320 

GTGGCCGACCGCCTGAGTGCACTTGATGTCTTTACGTCGGCCTCAATTCACCATGCCATT 
ValAlaAspArgLeuSerAlaLeuAspValPheThrSerAlaSerll eHisHisAlal le 



3330 3340 33S0 3360 3370 3380 

ACGCATTTTGCCCGCGAGCGCTGGCAGCAGCAGGGCTTTTTCCGCATGCTGAATCGCATG 
ThrHi sPheAl aArgGl uArgTrpGl nGl nGl nGlyPhePheAr^Me tLeuAsnArgMe t 



3390 3400 3410 3420 3430 3440 

CTGTTTTTAGCCGGACCCGCCGATTCACGCTGGCGGGTTATGCAGCGTTTTTATGGTTTA 
LeuPheLeuAl aG 1 yProAl aAspSerArgXrpArsVal Me tGl nArgPheTyrGl yLeu 



3450 3460 3470 3480 3490 3500 

CCTGAAGATTTAATTGCCCGTTTTTATGCGGGAAAACTCACGCTGACCGATCGGCTACGT 
ProGluAspLeuI leAlaArsPheTyrAlaGlyLysLeuThrLeuThrAspArgLeuArg 

3510. 3520 3530 3540 3550 3560 

ATTCTGAGCGGCAAGCCGCCTGTTCCGGTATTAGCAGCATTGCAAGCCATTATGACGACT 
I leLeuSerGlyLysProProValProValLeuAlaAlaLeuGl nAlal leMetThrThr 



FIG. 3 (b) 



3570 
CATCGTTAA 
Hi sArg^** 

F 
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3590 3600 3610 3620 3630 3540 

ATGAAACCAACTACGGTAATTGGTGCAGGCTTCGGTGGCCTGGCACTGGCAATTCGTCTA 
MetLysProThrThrYal I leGlyAl aGIyPh.eGIyGlyLeuAl aLeuAI al 1 eArgLeu 



3 3650 3660 3670 3680 3590 3700 

CAAGCTGCGGGGATCCCCGTCTTACTGCTTGAACAACGTGATAAACCCGGCGGTCGGGCT 
Gl nAl aAl aGl y 1 1 ePro Va 1 LeuLeuLeuGl uGl nArgAspLysProG I yG 1 yArgA 1 a 



3710 3720 3730 3740 3750 3760 

TATGTCTACGAGGATCAGGGGTTTACCTTTGATGCAGGCCCGACGGTTATCACCGATCCC 
TyrValTyrGluAspGlnGlyPheThrPheAspAl aGlyProThrVal 1 1 eThrAspPro 



3770 3780 3790 3800 3810 3820 

AGTGCCATTGAAGAACTGTTTGCACTGGCAGGAAAACAGTTAAAAGAGTATGTCGAACTG 
SerAl al I eGluGluLeuPhcAl aLeuAl aGlyLysGl nLeuLysGluTyrValGl uLeu 



3830 3840 3850 3860 3870 3880 

CTGCCGGTTACGCCGTTTTACCGCCTGTGTTGGGAGTCAGGGAAGGTCTTTAATTACGAT 
LeuProValThrProPheTyrAr^LeuCysTrpGl uSerGlyLysValPheAsnTyrAsp 



3890 3900 3910 3920 3930 3940 

AACGATCAAACCCGGCTCGAAGCGCAGATTCAGCAGTTTAATCCCCGCGATGTCGAAGGT 
AsnAspGlnThrAr£:LeuGluAl aGl nil eGl nGl nPheAsnProArgAspValGl uGly 



3950 3960 3970 3980 3990 4000 

TATCGTCAGTTTCTGGACTATTCACGCGCGGTGTTTAAAGAAGGCTATCTAAAGCTCGGT 
TyrArgGl nPheLeuAspTyrSerArgAl aValPheLysGl uGl yTyrLeuLysLeuGl y 



4010 4020 4030 4040 4050 4060 

ACTGTCCCTTTTTTATCGTTCAGAGACATGCTTCGCGCCGCACCTCAACTGGCGAAACTG 
ThrValPrdPheLeuSerPheAr^AspMe tLeuAr^Al aAl aProGl nLeuAi aLysLeu 



4070 4080 4090 4100 4110 4120 

CAGGCATGGAGAAGCGTTTACAGTAAGGTTGCCAGTTACATCGAAGATGAACATCTGCGC 
Gl nAlaTrpArgSerValTyrSerLysVal Al aSerTyr 1 1 eGluAspGluHisLeuArg 



4130 4140 41S0 4160 4170 4180 

CAGGCGTTTTCTTTCCACTCGCTGTTGGTGGGCGGCAATCCCTTCGCCACCTCATCCATT 
Gl nAl aPheSerPheHisSerLeuLeuYalGl yGl yAsnProPheAl aThrSerSer 1 1 e 



4190 4200 4210 4220 4230 4240 

TATACGTTGATACACGCGCTGGAGCGTGAGTGGGGCGTCTGGTTTCCGCGTGGCGGCACC 
TyrThrLeuI 1 eHi sAl aLeuGluAr«Gl uTrpGl yValTrpPheProArgGI yGl yThr 



FIG. 4 (a) 
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42S0 4260 4270 4280 4290 4300 

GGCGCATTAGTTCAGGGGATGATAAAGCTGTTTCAGGATCTGGGTGGCGAAGTCGTGTTA 
GI yA 1 aLeuVa 1 Gl nGl yMe 1 1 1 eLysLeuPheG I nAspLeuG I yG 1 yG 1 uYa 1 Ya 1 Leu 



4310 4320 4330 4340 4350 4360 

AACGCCAGAGTCAGCCATATGGAAACGACAGGAAACAAGATTGAAGCCGTGCATTTAGAG 
AsnAlaArgValSerHisMe tGluThrThrGlyAsnLys 1 1 eGluAl aValHi sLeuGlu 



4370 4380 4390 4400 4410 4420 

GACGGTCGCAGGTTCCTGACGCAAGCCGTCGCGTCAAATGCAGATGTGGTTCATACCTAT 
AspGlyArgArgPheLeuThrGlnAlaValAlaSerAsnAlaAspVal ValHisThrTyr 



4430 4440 4450 4460 4470 4480 

CGCGACCTGTTAAGCCAGCACCCTGCCGCGGTTAAGCAGTCCAACAAACTGCAGACTAAG 
Ar^AspLeuLeuSerGl nHi sProAl aAI aVal LysGl nSerAsnLysLeuGl nThrLys 



4490 4S00 4510 4S20 4S30 4540 

CGCATGAGTAACTCTCTGTTTGTGCTCTATTTTGGTTTGAATCACCATCATGATCAGCTC 
ArgMetSerAsnSerLeuPheValLeuTyrPheGlyLeuAsnHisHisHi sAspGl nLeu 



4550 4560 4570 4580 4590 4600 

GCGCATCACACGGTTTGTTTCGGCCCGCGTTACCGCGAGCTGATTGACGAAATTTTTAAT 
AlaHisHisThrYalCysPheGIyProArgTyrArgGluLeuI leAspGluI lePheAsn 



4610 4620 4630 4640 4650 4660 

CATGATGGCCTCGCAGAGGACTTCTCACTTTATCTGCACGCGCCCTGTGTCACGGATTCG 
Hi sAspGl yLeuAl aGl uAspPheSerLeuTyrLeuHi sAl aProCysYa 1 ThrAspSer 



4670 4680 4690 4700 4710 4720 

TCACTGGCGCCTGAAGGTTGCGGCAGTTACTATGTGTTGGCGCCGGTGCCGCATTTAGGC 
SerLeuAlaProGluGlyCysGIySerTyrTyrYalLeuAl aProYalProHi sLeuGly 



4730 4740 4750 4760 4770 4780 

ACCGCGAACCTCGACTGGACGGTTGAGGGGCCAAAACTACGCGACCGTATTTTTGCGTAC 
ThrAlaAsnLeuAspTrpThrValGluGlyProLysLeuArgAspArgl lePheAI aTyr 



4790 4800 4810 4820 4830 4840 

CTTGAGCAGCATTACATGCCTGGCTTACGGAGTCAGCTGGTCACGCACCGGATGTTTACG 
LeuGluGlnHisTyrMetProGlyLeuArgSerGlnLeuYalThrHisArgMe tPheThr 



4850 4860 4870 4880 4890 4900 

CCGTTTGATTTTCGCGACCAGCTTAATGCCTATCATGGCTCAGCCTTTTCTGTGGAGCCC 
ProPheAspPheArgAspGlnLeuAsnAl aTyrHisGlySerAl aPheSerYal Gl uPro 



FI G. 4 (b) 
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4910 4920 4930 4940 4950 4960 

GTTCTTACCCAGAGCGCCTGGTTTCGGCCGCATAACCGCGATAAAACCATTACTAATCTC 
ValLeuThrGl nSerAlaTrpPheArgProHisAsnArgAspLysThrl 1 eThrAsnLeu 



4970 4980 4990 5000 5010 5020 

TACCTGGTCGGCGCAGGCACGCATCCCGGCGCAGGCATTCCTGGCGTCATCGGCTCGGCA 
TyrLeuValGlyAlaGlyThrHisProGlyAlaGlyl leProGlyVal I leGlySerAl a 



5030 5040 5050 5060 

AAAGCGACAGCAGGTTTGATGCTGGAGGATCTGATTTGA 
LysAl aThrAl aGl yLeuMe tLeuGl uAspLeuI 1 e*** 



H 
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5100 5110 5120 5130 5140 5150- 

ATGGCAGTTGGCTCGAAAAGTTTTGCGACAGCCTCAAAGTTATTTGATGCAAAAACCCGG 
,Me tAlaValGlySerLysSerPheAl aThrAI aSerLysLeuPheAspAl aLysThrArg 



5160 5170 5180 5190 5200 5210 

CGCAGCGTACTGATGCTCTACGCCTGGTGCCGCCATTGTGACGATGTTATTGACGATCAG 
ArgSerYalLeuMetLeuTyrAI aTrpCysArgHisCysAspAspVal 1 1 eAspAspGl n 



5220 5230 5240 5250 5260 5270 

ACGCTGGGCTTTCAGGCCCGGCAGCCTGCCTTACAAACGCCCGAACAACGTCTGATGCAA 
ThrLeuGlyPheGl nAl aArifGl nProAl aLeuGl nThrProGl uGl nArgLeuMe tGl n 



5280 5290 5300 5310 5320 5330 

CTTGAGATGAAAACGCGCCAGGCCTATGCAGGATCGCAGATGCACGAACCGGCGTTTGCG 
LeuGluMe tLysThrArgGl nAl aTyrAl aGl ySerGI nMe IHi sGl uProAl aPheAl a 



5340 5350 5360 5370 5380 5390 

GCTTTTCAGGAAGTGGCTATGGCTCATGATATCGCCCCGGCTTACGCGTTTGATCATCTG 
AlaPheGlnGluYal AlaMetAlaHisAspI leAlaProAlaTyrAl aPheAspHisLeu 



5400 5410 5420 5430 5440 54S0 

GAAGGCTTCGCCATGGATGTACGCGAAGCGCAATACAGCCAACTGGATGATACGCTGCGC 
GluGlyPheAlaMetAspVal ArgGluAI aGlnTyrSerGInLeuAspAspThrLeuArs 



5460 5470 5480 5490 SSOO 5510 

TATTGCTATCACGTTGCAGGCGTTGTCGGCTTGATGATGGCGCAAATCATGGGCGTGCGG 
TyrCysTyrHl3ValAlaGlyYalValGlyLeuMetMetAlaGlnI 1 eMe tGl yVal Ars 



5520 5530 5540 5550 5560 5570 

GATAACGCCACGCTGGACCGCGCCTGTGACCTTGGGCTGGCATTTCAGTTGACCAATATT 
AspAsnAl aThrLeuAspArgAl aCysAspLeuGlyLeuAl aPheGl nLeuThrAsn I 1 e ' 



5580 5590 5600 5610 5620 5630 

GCTCGCGATATTGTGGACGATGCGCATGCGGGCCGCTGTTATCTGCCGGCAAGCTGGCTG 
AlaArsAspI leVal AspAspAl aHisAI aGlyArgCysTyrLeuProAl aSerTrpLeu 



5640 5650 5660 5670 5680 5690 

GAGCATGAAGGTCTGAACAAAGAGAATTATGCGGCACCTGAAAACCGTCAGGCGCTGAGC 
GluHisGluGlyLeuAsnLysGl uAsnTyrAl aAlaProGluAsnArgGl nAl aLeuSer 



5700 5710 5720 5730 5740 5750 

CGTATCGCCCGTCGTTTGGTGCAGGAAGCAGAACCTTACTATTTGTCTGCCACAGCCGGC 
Argl 1 eAl aAr^ArgLeuValGl nGluAl aGIuProTyrTyrLeuSerAl aThrAl aGI y 
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5760 5770 5780 5790 5800 5810 

CTGGCAGGGTTGCCCCTGCGTTCCGCCTGGGCAATCGCTACGGCGAAGCAGGTTTACCGG 
LftuAlaGlyLftuProLeuArgSerAI aTrpAl al leAlaThrAl aLysGlnValTyrArg 



5820 5830 5840 5850 5860 5870 

AAAATAGGTGTCAAAGTTGAACAGGCCGGTCAGCAAGCCTGGGATCAGCGGCAGTCAACG 
Lysl leGlyValLysValGluGlnAlaGlyGl nGl nAl aTrpAspGl nArgGl nSerXhr 



5880 5890 5900 5^10 5920 5930 

ACCACGCCCGAAAAATTAACGCTGCTGCTGGCCGCCTCTGGTCAGGCCCTTACTTCCCGG 
ThrThrProGluLysLeuThrLeuLeuLeuAl aAlaSerGlyGl nAlaLeuThrSerArg 



5940 5950 5960 5970 5980 

ATGCGGGCTCATCCTCCCCGCCCTGCGCATCTCTGGCAGCGCCCGCTCTAG 
MetArgAlaHisProProArgProAl aHi sLeuTrpGl nArgProLeu.*** 



J 
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452 

ATGTTGTGGATTTGGAATGCCCTGATCGTTTTCGTTACCGTGATTGGCATGGAAGTGATT 
iMetLeuTrpI leTrpAsnAlaLeuI I eValPheValThrVal 1 1 eGl yMe tG 1 uVal M e 



GCTGCACTGGCACACAAATACATCATGCACGGCTGGGGTTGGGGATGGCATCTTTCACAT 
Al aAl aLeuAl aHl sLysTyrl 1 eMe tHi sGlyTrpGlyTrpGlyTrpHisLeuSerHi s 



CATGAACCGCGTAAAGGTGCGTTTGAAGTTAACGATCTTTATGCCGTGGTTTTTGCTGCA 
HlsGluProArgLysGlyAlaPheGluVal AsnAspLeuTyrAlaVal ValPheAlaAl a 



TTATCGATCCTGCTGATTTATCTGGGCAGTACAGGAATGTGGCCGCTCCAGTGGATTGGC 
LeuSerl leLeuLeuI leTyrLeuGlySerThrGlyMetTrpProLeuGlnTrpI leGly 



GCAGGTATGACGGCGTATGGATTACTCTATTTTATGGTGCACGACGGGCTGGTGCATCAA 
A I aGl yMe iThrAl aTyrGl yLeuLeuTy rPheMe t Va 1 Hi s AspG 1 yLeuVa 1 H i sG 1 n 



CGTTGGCCATTCCGCTATATTCCACGCAAGGGCTACCTCAAACGGTTGTATATGGCGCAC 
Ar«TrpProPheAr«TyrI 1 eProArgLysGlyTyrLeuLysArsLeuTyrMe lAlaHi s 



CGTATGCATCACGCCGTCAGGGGCAAAGAAGGTTGTGTTTCTTTTGGCTTCCTCTATGCG 
ArgMetHisHisAlaValAr«GlyLysGl uGlyCysValSerPheGlyPheLeuTyrAl a 



CCGCCCCTGTCAAAACTTCAGGCGACGCTCCGGGAAAGACATGGCGCTAGAGCGGGCGCT 
ProProLeuSerLysLeuGl nAl aThrLeuArgGl uArgHi sGl yAl aArgAl aGl yA 1 a 



5925 

GCCAGAGATGCGCAGGGCGGGGAGGATGAGCCCGCATCCGGGAAGTAA 
AlaArgAspAl aGl nGlyGlyGluAspGluProAl aSerGlyLys*** 



K 



L 
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1 10 20 30 40 50 

GGTACCGCAC GGTCTGCCAA TCCGACGGAG GTTTATGAAT TTTCCACCTT TTCCACAAGC 

70 80 90 100 110 

TCAACTAGTA TTAACGATGT GGATTTAGCA AAAAAAACCT GTAACCCTAA ATGTAAAATA 

130 140 150 160 170 

ACGGGTAAGC CTGCCAACCA TGTTATGGCA GATTAAGCGT CTTTTTGAAG GGCACCGCAT 

190 200 210 220 ^ 230 

CTTTCGCGTT GCCGTAAATG TATCCGTTTA TAAGGACAGC CCGAATGACG GTCTGCGCAA 

250 260 270 280 290 

AAAAACACGT TCATCTCACT CGCGATGCTG CGGAGCAGTT ACTGGCTGAT ATTGATCGAC 

310 320 330 340 350 

GCCTTGATCA GTTATTGCCC GTGGAGGGAG AACGGGATGT TGTGGGTGCC GCGATGCGTG 

370 380 390 400 410 

AAGGTGCGCT GGCACCGGGA AAACGTATTC GCCCCATGTT GCTGTTGCTG ACCGCCCGCG 

430 440 450 460 470 

ATCTGGGTTG CGCTGTCAGC CATGACGGAT TACTGGATTT GGCCTGTGCG GTGGAAATGG 

490 500 510 520 530 

TCCACGCGGC TtCGCTGATC CTTGACGATA TGCCCTGCAT GGACGATGCG AAGCTGCGGC 

550 560 570 580 590 

GCGGACGCCC TACCATTCAT TCTCATTACG GAGAGCATGT GGCAATACTG GCGGCGGTTG 

610 620 630 640 650 

CCTTGCTGAG TAAAGCCTTT GGCGTAATTG CCGATGCAGA TGGCCTCACG CCGCTGGCAA 

670 680 690 700 710 

AAAATCGGGC GGTTTCTGAA CTGTCAAACG CCATCGGCAT GCAAGGATTG GTTCAGGGTG 

730 740 750 760 770 

AGTTCAAGGA TCTGTCTGAA GGGGATAAGC CGCGCAGCGC TGAAGCTATT TTGATGACGA 

790 800 810 820 830 

ATCACTTTAA AACCAGCACG CTGTTTTGTG CCTCCATGCA GATGGCCTCG ATTGTTGCGA 

850 860 870 880 890 

ATGCCTCCAG CGAAGCGCGT GATTGCCTGC ATCGTTTTTC ACTTGATCTT GGTCAGGCAT 

910 920 930 940 950 

TTCAACTGCT GGACGATTTG ACCGATGGCA TGACCGACAC CGGTAAGGAT AGCAATCAGG 

970 980 990 1000 1010 

ACGCCGGTAA ATCGACGCTG GTCAATCTGT TAGGCCCGAG GGCGGTTGAA GAACGTCTGA 

1030 1040 1050 1060 1070 

GACAACATCT TCAGCTTGCC AGTGAGCATC TCTCTGCGGC CTGCCAACAC GGGCACGCCA 

1090 1100 1110 1120 1130? 

CTCAACATTT TATTCAGGCC TGGTTTGACA AAAAACTCGC TGCCGTCAGT TAAGGATGCT 
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9 1150 
GCATGAGCCA 

1210 
ATCTCGCTCA 

1270 
TTAAACACTT 

1330 
CCCCC6GCGC 

1390 
TGAAGCTCAT 

1450 
CATTTAACGA 

1510 
TCGTTGCTGA 

1570 
GTGAACCGGA 

1630 
GCGAACGTTA 

1690 
TCATTGCCGA 



1 160 
TTTCGCGGCG 

1220 
GGAACTGGTC 

1260 
GATCGATAGC 

1340 
GTTAACGCGC 

1400 
CAATGAAATG 

1460 
TCTGGCCGTC 

1520 
AGCACTGGGA 

1580 
TATGCCCCTG 

1 640. 
TGCCGCCAGT 

1700 
ACACAGCCAC 



1 1 70 
ATCGCACCGC 

1230 
GCGCGCGGTC 

1290 
GAAACCATTG 

1 350 
GTGCTACACC 

1410 
GCGCGCACCA 

1470 
GATGGCGTCA 

1530 
CTGCCGTTTA 

1590 
GCGGTTATGC 

1650 
GAAAAAATTT 

1710 
AGAATGGGCT 



1 180 
CTTTTTACAG 

1240 
ATCGGGTGAC 

1300 
GATTTCATTC 

1360 
TGGCGGCTCA 

1 420 
CCGATATGCT 

1480 
TTGTTGATCA 

1 540 
TCTCTGTCGC 

1600 
CTTTCGAATA 

1660 
ATGACTGGCT 

1720 
TAGCCCCCCG 



1190 
CCATGTTCGC 

1250 
CTTTATTCAG 

1 31 0 
CGTCGGGACA 

1370 
TCCTCTGGGG 

1430 
GTGCCGCGAA 

1 490 
AATGGAACCG 

1 550 
CTGCGCGCTG 

1 610 
CGGGACCAGC 

1 670 
AATGCGTCGT 



GCATTACAGA 



CAATACGATA 



GACAGCCATC 



CCGTCAATGC 



CTCCCCCAGG 



GCAGGCGCGC 



CCTCTCAATC 



GACGCGGCTC 



1730 
GCAAAAGCTT 



1750 1760 1770 

TTTCGCCACT GGCGCAAATC AGCCAGCTTG 



1780 
TTCCTGAACT 



1790 
GGATTTTCCC 



CATGACCGTG 



CACCAGTGTT 



CGCAAAGCGT 



1810 1820 1830 1840 18S0 

TACCGGCTTG TTTTCATGCC GTCGGGCCTC TGCGCGAAAC GCACGCACCG TCAACGTCTT 



1870 
CATCCCGTTA 

1 930 
AGGGACACCG 



1880 
TTTTACATCC 

1940 
TTATGGGCTG 



1890 
TCAGAAAAAC 

1 950 
TTTAAAACGA 



1 900 
CCCGGATTTT 

1 960 
TAGTGAAAGC 



1 91 0 
CGCCTCGCTG 

1 970 
CTGTGAAGAA 



GGCACGCTTC 



ATTGACGGTC 



1990 2000 2010 2020 2030 

AGCTCCTGTT AGCCCACTGT GGTCGTCTTA CGGACTCTCA GTGTGAAGAG CTGGCGCGAA 

2050 2060 2070 2080 2090 

GCCGTCATAC ACAGGTGGTG GATTTTGCCG ATCAGTCAGC CGCGCTGTCT CAGGCGCAGC 



21 10 
. TGGCGATCAC 



2120 
CCACGGCGGC 



2130 
ATGAATACGG 



21 40 
TACTGGACGC 



21 50 
GATTAATTAC 



CGGACGCCCC 



2170 2180 2190 2200 2210 

TTTTAGCGCT TCCGCTGGCC TTTGATCAGC. CCGGCGTCGC GTCACGCATC GTTTATCACG 

2230 2240 2250 2260 2270 

GCATCGGCAA GCGTGCTTCC CGCTTTACCA CCAGCCATGC TTTGGCTCGT CAGATGCGTT 
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2290 2300 2310 

CATTGCTGAC CAACGTCGAC TTTCAGCAGC 

2350 2360 2370 

TGGCAGGGGG CACCATGGCC GCTGCCGATA 

2410 2420 f 2430 

CTGTCTTAAG TGGGAGCGGC TATGCAACCG 

2470 2480 2490 

CTCGCGAATG GCCTTATCGC CCTGCGTCTT 

2530 2540 2550 

CTTATCGACG CCGCACCCCA GGCGGGCGGG 

2590 2600 2610 

TTGACTGAGA GCCAACATCG TTGGATAGCT 

2650 2660 2670 

CAGGTACGCT TTCCCACACG CCGTCGTAAG 

2710 2720 2730 

CAGCGTTTCG CTGAGGTTTT ACAGCGACAG 

2770 2780 2790 

GTCGCAGAGG TTAATGCGGA ATCTGTTCGG 

2830 2840 2850 

GCGGTGATTG ACGGGCGGGG TTATGCGGCA 

2890 2900 2910 

TTTATTGGCC AGGAATGGCG ATTGAGCCAC 

2950 2960 2970 

GATGCCACGG TCGATCAGCA AAATGGTTAT 

3010 3020 3030 

ACCAGATTGT TAATTGAAGA CACGCACTAT 

3070 3080 3090 

GCGCGGCAAA ATATTTGCGA CTATGCCGCG 



2320 2330 
GCATGGCGAA AATCCAGACA GCCCTTCGTT 

2380 2390 
TCATTGAGCA GGTTATGTGC ACCGGTCAGC 

92440 2450 
CATTATGATC TGATTCTCGT GGGGGCTGGA 

2500 2510 
CAGCAGCAGC AACCTGATAT GCGTATTTTG 

2560 2570 
AATCATACGT GGTCATTTCA CCACGATGAT 

2620 2630 
CCGCTGGTGG TTCATCACTG GCCCGACTAT 

2680 2690 
CTGAACAGCG GCTACTTTTG TATTACTTCT 

2740 2750 
TTTGGCCCGC ACTTGTGGAT GGATACCGCG 

2800 2810 
TTGAAAAAGG GTCAGGTTAT CGGTGCCCGC 

2860. 2870 
AATTCAGCAC TGAGCGTGGG CTTCCAGGCG 

2920 2930 
CCGCATGGTT TATCGTCTCC CATTATCATG 

2980 2990 
CGCTTCGTGT ACAGCCTGCC GCTCTCGCCG 

3040 3050 
ATTGATAATG CGACATTAGA TCCTGAATGC 

3100 3110 
CAACAGGGTT GGCAGCTTCA GACACTGCTG 



3130 31 40 3150 3160 3170 

CGAGAAGAAC AGGGCGCCTT ACCCATTACT CTGTCGGGCA ATGCCGACGC ATTCTGGCAG 

3190 3200 3210 3220 3230 

CAGCGCCCCC TGGCCTGTAG TGGATTACGT GCCGGTCTGT TCCATCCTAC CACCGGCTAT 

3250 3260 3270 3280 3290 

TCACTGCCGC TGGCGGTTGC CGTGGCCGAC CGCCTGAGTG CACTTGATGT CTTTACGTCG 

3310 3320 3330 3340 3350 

GCCTCAATTC ACCATGCCAT TACGCATTTT GCCCGCGAGC GCTGGCAGCA GCAGGGCTTT 

3370 3380 3390 3400 3410 

TTCCGCATGC TGAATCGCAT GCTGTTTTTA GCCGGACCCG CCGATTCACG CTGGCGGGTT 
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3430 3440 3450 3460 3470 

ATGCAGCGTT TTTATGGTTT ACCTGAAGAT TTAATTGCCC GTTTTTATGC GGGAAAACTC 

3490 3500 3510 3520 3530^ 

ACGCTGACCG ATCGGCTACG TATTCTGAGC GGCAAGCCGC CTGTTCCGGT ATTAGCAGCA 

3550 3560 3I7O 3580 ? 3590 

TTGCAAGCCA TTATGACGAC TCATCGTTAA AGAGCGACTA CATGAAACCA ACTACGGTAA 

3610 3620 3630 3640 3650 

TTGGTGCAGG CTTCGGTGGC CTGGCACTGG CAATTCGTCT ACAAGCTGCG GGGATCCCCG 

3670 3680 3690 3700 3710 

TCTTACTGCT TGAACAACGT GATAAACCCG GCGGTCGGGC TTATGTCTAC GAGGATCAGG 

3730 3740 3750 3760 3770 

GGTTTACCTT TGATGCAGGC CCGACGGTTA TCACCGATCC CAGTGCCATT GAAGAACTGT 

3790 3800 3810 3820 3830 

TTGCACTGGC AGGAAAACAG TTAAAAGAGT ATGTCGAACT GCTGCCGGTT ACGCCGTTTT 

3850 3860 3870 3880 3890 

ACCGCCTGTG TTGGGAGTCA GGGAAGGTCT TTAATTACGA TAACGATCAA ACCCGGCTCG 

3910 3920 3930 3940 3950 

AAGCGCAGAT TCAGCAGTTT AATCCCCGCG ATGTCGAAGG TTATCGTCAG TTTCTGGACT 

3970 3980 3990 4000 4010 

ATTCACGCGC GGTGTTTAAA GAAGGCTATC TAAAGCTCGG TACTGTCCCT TTTTTATCGT 

4030 4040 4050 4060 4070 

TCAGAGACAT GCTTCGCGCC GCACCTCAAC TGGCGAAACT GCAGGCATGG AGAAGCGTTT 

4090 4100 4110 4120 4130 

ACAGTAAGGT TGCCAGTTAC ATCGAAGATG AACATCTGCG CCAGGCGTTT TCTTTCCACT 

4150 4160 4170 4180 4190 

CGCTGTTGGT GGGCGGCAAT CCCTTCGCCA CCTCATCCAT TTATACGTTG ATACACGCGC 

4210 4220 4230 4240 4250 

TGGAGCGTGA GTGGGGCGTC TGGTTTCCGC GTGGCGGCAC CGGCGCATTA GTTCAGGGGA 

4270 4280 4290 4300 4310 

TGATAAAGCT GTTTCAGGAT CTGGGTGGCG AAGTCGTGTT AAACGCCAGA GTCAGCCATA . 

4330 4340 4350 4360 4370 

TGGAAACGAC AGGAAACAAG ATTGAAGCCG TGCATTTAGA GGACGGTCGC AGGTTCCTGA 

4390 4400 4410 4420 4430 

CGCAAGCCGT CGCGTCAAAT GCAGATGTGG TTCATACCTA TCGCGACCTG TTAAGCCAGC 

4450 4460 4470 4480 4490 

ACCCTGCCGC GGTTAAGCAG TCCAACAAAC TGCAGACTAA GCGCATGAGT AACTCTCTGT 

4510 4520 4530 4540 4550 

TTGTGCTCTA TTTTGGTTTG AATCACCATC ATGATCAGCT CGCGCATCAC ACGGTTTGTT 
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4570 4580 4590 4600 4610 

TCGGCCCGCG TTACCGCGAG CTGATTGACG AAATTTTTAA TCATGATGGC CTCGCAGAGG 

4630 4640 4650 4660 4670 

ACTTCTCACT TTATCTGCAC GCGCCCTGTG TCACGGATTC GTCACTGGCG CCTGAAGGTT 

4690 4700 4710 4720 4730 

GCGGCAGTTA CTATGTGTTG GCGCCGGTGC CGCATTTAGG CACCGCGAAC CTCGACTGGA 

4750 4760 4770 4780 4790 

CGGTTGAGGG GCCAAAACTA CGCGACCGTA TTTTTGCGTA CCTTGAGCAG CATTACATGC 

4810 4820 4830 4840 4850 

CTGGCTTACG GAGTCAGCTG GTCACGCACC GGATGTTTAC GCCGTTTGAT TTTCGCGACC 

4870 4880 4890 4900 4910 

AGCTTAATGC CTATCATGGC TCAGCCTTTT CTGTGGAGCC CGTTCTTACC CAGAGCGCCT 

4930 4940 4950 4960 4970 

GGTtTCGGCC GCATAACCGC GATAAAACCA TTACTAATCT CTACCTGGTC GGCGCAGGCA 

4990 5000 5010 5020 5030 

CGCATCCCGG CGCAGGCATT CCTGGCGTCA TCGGCTCGGC AAAAGCGACA GCAGGTTTGA 

5050 5^60 5070 5080 5090 I 

TGCTGGAGGA TCTGATTTGA ATAATCCGTC GTTACTCAAT CATGCGGTCG AAACGATGGC 

5110 51'20 5130 5140 5150 

AGTTGGCTCG AAAAGTTTTG CGACAGCCTC AAAGTTATTT GATGCAAAAA CCCGGCGCAG 

5170 5180 5190 5200 5210 

CGTACTGATG CTCTACGCCT GGTGCCGCCA TTGTGACGAT GTTATTGACG ATCAGACGCT 

5230 5240 5250 5260 5270 

GGGCTTTCAG GCCCGGCAGC CTGCCTTACA AACGCCCGAA CAACGTCTGA TGCAACTTGA 

5290 5300 5310 5320 5330 

GATGAAAACG CGCCAGGCCT ATGCAGGATC GCAGATGCAC GAACCGGCGT TTGCGGCTTT 

5350 5360 5370 5380 5390 

TCAGGAAGTG GCTATGGCTC ATGATATCGC CCCGGCTTAC GCGTTTGATC ATCTGGAAGG 

5410 5420 5430 5440 5450 

CTTCGCCATG GATGTACGCG AAGCGCAATA CAGCCAACTG GATGATACGC TGCGCTATTG 

5470 5480 5490 5500 5510 

CTATCACGTT GCAGGCGTTG TCGGCTTGAT GATGGCGCAA ATCATGGGCG TGCGGGATAA 

5530 5540 5550 5560 5570 

CGCCACGCTG GACCGCGCCT GTGACCTTGG GCTGGCATTT CAGTTGACCA ATATTGCTCG 

5590 5600 5610 5620 5630 

CGATATTGTG GACGATGCGC ATGCGGGCCG CTGTTATCTG CCGGCAAGCT GGCTGGAGCA 

5650 5660 5670 5680 5690 

TGAAGGTCTG AACAAAGAGA ATTATGCGGC ACCTGAAAAC CGTCAGGCGC TGAGCCGTAT 
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5710 5720 5730 5740 5750 

CGCCCGTCGT TTGGTGCAGG AAGCAGAACC TTACTATTTG TCTGCCACAG CCGGCCTGGC 

5770 5780 5790 5800 5810 

AGGGTTGCCC CTGCGTTCCG CCTGGGCAAT CGCTACGGCG AAGCAGGTTT ACCGGAAAAT 

5830 5840 5850 5860 5870 

AGGTGTCAAA GTTGAACAGG CCGGTCAGCA AGCCTGGGAT CAGCGGCAGT CAACGACCAC 

L 

5890 5900 5910 5920 5.930 

GCCCGAAAAA TTAACGCTGC TGCTGGCCGC CTCTGGTCAG GCCCTTACTT CCCGGATGCG 

5950 5960 5970 5980 5990 

GGCTCATCCT CCCCGCCCTG CGCATCTCTG GCAGCGCCCG CTCTAGCGCC ATGTCTTTCC 

6010 6020 6030 6040 6050 

CGGAGCGTCG CCTGAAGTTT TGACAGGGGC GGCGCATAGA GGAAGCCAAA AGAAACACAA 

6070 6080 6090 6100 6110 

CCTTCTTTGC CCCTGACGGC GTGATGCATA CGGTGCGCCA TATACAACCG TTTGAGGTAG 

6130 6140 6150 6160 6170 

CCCTTGCGTG GAATATAGCG GAATGGCCAA CGTTGATGCA CCAGCCCGTC GTGCACCATA 

6190 6200 6210 6220 6230 

AAATAGAGTA ATCCATACGC CGTCATACCT GCGCCAATCC ACTGGAGCGG CCACATTCCT 

6250 6260 6270 6280 6290 

GTACTGCCCA GATAAATCAG CAGGATCGAT AATGCAGCAA AAACCACGGC ATAAAGATCG 

63.10 6320 6330 6340 6350 

TTAACTTCAA ACGCACCTTT ACGCGGTTCA TGATGTGAAA GATGCCATCC CCAACCCCAG 

6370 6380 6390 6400 6410 

CCGTGCATGA TGTATTTGTG TGCCAGTGCA GCAATCACTT CCATGCCAAT CACGGTAACG 

6430. 6440 6450 ^ 6460 6470 

AAAACGATCA GGGCATTCCA AATCCACAAC ATAATTTCTC CGGTAGAGAC GTCTGGCAGC 

6490 6500 6510 6520 6530 

AGGCTTAAGG ATTCAArTTT AACAGAGATT AGCCGATCTG GCGGCGGGAA GGGAAAAAGG 

6550 6560 6570 6580 6590 

CGCGCCAGAA AGGCGCGCCA GGGATCAGAA GTCGGCTTTC AGAACCACAC GGTAGTTGGC 

6610 6620 6630 6640 6650 

TTTACCTGCA CGAACATGGT CCAGTGCATC GTTGATTTTC GACATCGGGA AGTACTCCAC 

6670 6680 6690 6700 6710 

TGTCGGCGCA ATATCTGTAC GGCCAGCCAG CTTCAGCAGT GAACGCAGCT GCGCAGGTGA 

6730 6740 6750 6760 6770 

ACCGGTTGAA GAACCCGTCA CGGCGCGGTC GCCTAAAATC AGGCTGAAAG CCGGGCACGT 

6790 6800 6810 6620 6830 

CAAACGGCTT CAGTACGGCA CCCACGGTAT GGAACTTACC GCGAGGCGCC AGGGCCGCAA 
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6850 6860 6870 6880 6890 

AGTAGGGTTG CCAiSTCGAGA TCGACGGCGA CCGTGCTGAT AATCAGGTCA AACTGGCCCG 

6910 6918 
CCAGGCTTTT TAAAGCTT 
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